[jira] [Updated] (HIVE-26166) Make website GDPR compliant and enable matomo analytics

2024-10-04 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-26166:
---
Summary: Make website GDPR compliant and enable matomo analytics  (was: 
Make website GDPR compliant)

> Make website GDPR compliant and enable matomo analytics
> ---
>
> Key: HIVE-26166
> URL: https://issues.apache.org/jira/browse/HIVE-26166
> Project: Hive
>  Issue Type: Task
>  Components: Website
>Reporter: Stamatis Zampetakis
>Assignee: Martijn Visser
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Per the email that was sent out from privacy we need to make the Hive website 
> GDPR compliant. 
>  # The link to privacy policy needs to be updated from 
> [https://hive.apache.org/privacy_policy.html] to 
> [https://privacy.apache.org/policies/privacy-policy-public.html]
>  # The google analytics service must be removed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26166) Make website GDPR compliant

2024-10-04 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-26166.

Fix Version/s: 4.1.0
 Assignee: Martijn Visser
   Resolution: Fixed

Fixed in 
https://github.com/apache/hive-site/commit/4f50371c738ede37571c6ae5a244994f8f670c95.

Thanks for the PR [~martijnvisser]!

> Make website GDPR compliant
> ---
>
> Key: HIVE-26166
> URL: https://issues.apache.org/jira/browse/HIVE-26166
> Project: Hive
>  Issue Type: Task
>  Components: Website
>Reporter: Stamatis Zampetakis
>Assignee: Martijn Visser
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Per the email that was sent out from privacy we need to make the Hive website 
> GDPR compliant. 
>  # The link to privacy policy needs to be updated from 
> [https://hive.apache.org/privacy_policy.html] to 
> [https://privacy.apache.org/policies/privacy-policy-public.html]
>  # The google analytics service must be removed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26166) Make website GDPR compliant

2024-10-04 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886912#comment-17886912
 ] 

Stamatis Zampetakis commented on HIVE-26166:


After HIVE-26565, the website is already GDPR compliant since the google 
analytics is removed and the privacy link is up-to-date. The PR#1 linked to 
this ticket is complementary work that enables analytics via the ASF matomo 
instance running at https://analytics.apache.org/

> Make website GDPR compliant
> ---
>
> Key: HIVE-26166
> URL: https://issues.apache.org/jira/browse/HIVE-26166
> Project: Hive
>  Issue Type: Task
>  Components: Website
>Reporter: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Per the email that was sent out from privacy we need to make the Hive website 
> GDPR compliant. 
>  # The link to privacy policy needs to be updated from 
> [https://hive.apache.org/privacy_policy.html] to 
> [https://privacy.apache.org/policies/privacy-policy-public.html]
>  # The google analytics service must be removed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28558) Drop HCatalog download page from the website

2024-10-04 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-28558:
--

 Summary: Drop HCatalog download page from the website
 Key: HIVE-28558
 URL: https://issues.apache.org/jira/browse/HIVE-28558
 Project: Hive
  Issue Type: Task
  Security Level: Public (Viewable by anyone)
  Components: Website
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


The HCatalog download page (https://hive.apache.org/general/hcatalogdownloads/) 
is mostly there for historical reasons. It was probably useful back in 2013 to 
inform users about the merge of HCatalog in Hive but for the past 10 years we 
have been releasing HCatalog as part of Hive so anyone who is using that module 
does not need to visit that obsolete page. 

Moreover, the presence of the HCatalog download page adds an additional level 
of indirection for users that want to download recent Hive releases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28551) Stale results when executing queries over recreated transactional tables

2024-10-02 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886366#comment-17886366
 ] 

Stamatis Zampetakis commented on HIVE-28551:


It seems that after HIVE-19820 there were changes in some .q.out files related 
to the invalidation of the query result cache (i.e., results_cache_truncate.q) 
so this bug could be a regression from that ticket.

> Stale results when executing queries over recreated transactional tables
> 
>
> Key: HIVE-28551
> URL: https://issues.apache.org/jira/browse/HIVE-28551
> Project: Hive
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>  Components: HiveServer2
>Affects Versions: 4.0.1
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
> Attachments: results_cache_invalidation3.q
>
>
> SQL queries return stale results from the cache when the tables involved in 
> the queries are dropped and then recreated with the same name.
> The problem can be reproduced by executing the following sequence of queries.
> {code:sql}
> CREATE TABLE author (fname STRING) STORED AS ORC 
> TBLPROPERTIES('transactional'='true');
> INSERT INTO author VALUES ('Victor');
> SELECT fname FROM author;
> DROP TABLE author;
> CREATE TABLE author (fname STRING) STORED AS ORC 
> TBLPROPERTIES('transactional'='true');
> INSERT INTO author VALUES ('Alexander');
> SELECT fname FROM author;
> {code}
> The first execution of the SELECT query correctly returns "Victor" as a 
> result.
> The second execution of the SELECT query incorrectly returns "Victor" while 
> it should return "Alexander".
> The problem manifestates only when the hive.query.results.cache.enabled is 
> set to true.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28551) Stale results when executing queries over recreated transactional tables

2024-10-02 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886365#comment-17886365
 ] 

Stamatis Zampetakis commented on HIVE-28551:


The problem can be reproduced on master (commit 
6e261a32657d36185654dd05af7215fe33123878) by running 
[^results_cache_invalidation3.q]

{noformat}
mvn test -Dtest=TestMiniLlapLocalCliDriver -Dtest.output.overwrite 
-Dqfile=results_cache_invalidation3.q -pl itests/qtest -Pitests
{noformat}


> Stale results when executing queries over recreated transactional tables
> 
>
> Key: HIVE-28551
> URL: https://issues.apache.org/jira/browse/HIVE-28551
> Project: Hive
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>  Components: HiveServer2
>Affects Versions: 4.0.1
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
> Attachments: results_cache_invalidation3.q
>
>
> SQL queries return stale results from the cache when the tables involved in 
> the queries are dropped and then recreated with the same name.
> The problem can be reproduced by executing the following sequence of queries.
> {code:sql}
> CREATE TABLE author (fname STRING) STORED AS ORC 
> TBLPROPERTIES('transactional'='true');
> INSERT INTO author VALUES ('Victor');
> SELECT fname FROM author;
> DROP TABLE author;
> CREATE TABLE author (fname STRING) STORED AS ORC 
> TBLPROPERTIES('transactional'='true');
> INSERT INTO author VALUES ('Alexander');
> SELECT fname FROM author;
> {code}
> The first execution of the SELECT query correctly returns "Victor" as a 
> result.
> The second execution of the SELECT query incorrectly returns "Victor" while 
> it should return "Alexander".
> The problem manifestates only when the hive.query.results.cache.enabled is 
> set to true.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28551) Stale results when executing queries over recreated transactional tables

2024-10-02 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-28551:
---
Attachment: results_cache_invalidation3.q

> Stale results when executing queries over recreated transactional tables
> 
>
> Key: HIVE-28551
> URL: https://issues.apache.org/jira/browse/HIVE-28551
> Project: Hive
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>  Components: HiveServer2
>Affects Versions: 4.0.1
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
> Attachments: results_cache_invalidation3.q
>
>
> SQL queries return stale results from the cache when the tables involved in 
> the queries are dropped and then recreated with the same name.
> The problem can be reproduced by executing the following sequence of queries.
> {code:sql}
> CREATE TABLE author (fname STRING) STORED AS ORC 
> TBLPROPERTIES('transactional'='true');
> INSERT INTO author VALUES ('Victor');
> SELECT fname FROM author;
> DROP TABLE author;
> CREATE TABLE author (fname STRING) STORED AS ORC 
> TBLPROPERTIES('transactional'='true');
> INSERT INTO author VALUES ('Alexander');
> SELECT fname FROM author;
> {code}
> The first execution of the SELECT query correctly returns "Victor" as a 
> result.
> The second execution of the SELECT query incorrectly returns "Victor" while 
> it should return "Alexander".
> The problem manifestates only when the hive.query.results.cache.enabled is 
> set to true.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28551) Stale results when executing queries over recreated transactional tables

2024-10-02 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886357#comment-17886357
 ] 

Stamatis Zampetakis commented on HIVE-28551:


HIVE-19154 adds an event based cache invalidation mechanism that is able remove 
stale entries from the cache after certain intervals but this is mostly a 
performance improvement and not a mechanism to guarantee correctness. The cache 
should never return stale entries for transactional tables.

> Stale results when executing queries over recreated transactional tables
> 
>
> Key: HIVE-28551
> URL: https://issues.apache.org/jira/browse/HIVE-28551
> Project: Hive
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>  Components: HiveServer2
>Affects Versions: 4.0.1
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> SQL queries return stale results from the cache when the tables involved in 
> the queries are dropped and then recreated with the same name.
> The problem can be reproduced by executing the following sequence of queries.
> {code:sql}
> CREATE TABLE author (fname STRING) STORED AS ORC 
> TBLPROPERTIES('transactional'='true');
> INSERT INTO author VALUES ('Victor');
> SELECT fname FROM author;
> DROP TABLE author;
> CREATE TABLE author (fname STRING) STORED AS ORC 
> TBLPROPERTIES('transactional'='true');
> INSERT INTO author VALUES ('Alexander');
> SELECT fname FROM author;
> {code}
> The first execution of the SELECT query correctly returns "Victor" as a 
> result.
> The second execution of the SELECT query incorrectly returns "Victor" while 
> it should return "Alexander".
> The problem manifestates only when the hive.query.results.cache.enabled is 
> set to true.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28551) Stale results when executing queries over recreated transactional tables

2024-10-02 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886351#comment-17886351
 ] 

Stamatis Zampetakis commented on HIVE-28551:


Based on the [initial 
design|https://issues.apache.org/jira/browse/HIVE-18513?focusedCommentId=16484227&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16484227]
 of the query result cache dropping or altering a transactional table should 
result in automatic invalidation of the cached entries.

It is OK to return stale results for non-transactional tables (when 
hive.query.results.cache.nontransactional.tables.enabled property is true) but 
it is *not* OK to return stale results for transactional tables.

> Stale results when executing queries over recreated transactional tables
> 
>
> Key: HIVE-28551
> URL: https://issues.apache.org/jira/browse/HIVE-28551
> Project: Hive
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>  Components: HiveServer2
>Affects Versions: 4.0.1
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> SQL queries return stale results from the cache when the tables involved in 
> the queries are dropped and then recreated with the same name.
> The problem can be reproduced by executing the following sequence of queries.
> {code:sql}
> CREATE TABLE author (fname STRING) STORED AS ORC 
> TBLPROPERTIES('transactional'='true');
> INSERT INTO author VALUES ('Victor');
> SELECT fname FROM author;
> DROP TABLE author;
> CREATE TABLE author (fname STRING) STORED AS ORC 
> TBLPROPERTIES('transactional'='true');
> INSERT INTO author VALUES ('Alexander');
> SELECT fname FROM author;
> {code}
> The first execution of the SELECT query correctly returns "Victor" as a 
> result.
> The second execution of the SELECT query incorrectly returns "Victor" while 
> it should return "Alexander".
> The problem manifestates only when the hive.query.results.cache.enabled is 
> set to true.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28551) Stale results when executing queries over recreated transactional tables

2024-10-02 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-28551:
--

 Summary: Stale results when executing queries over recreated 
transactional tables
 Key: HIVE-28551
 URL: https://issues.apache.org/jira/browse/HIVE-28551
 Project: Hive
  Issue Type: Bug
  Security Level: Public (Viewable by anyone)
  Components: HiveServer2
Affects Versions: 4.0.1
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


SQL queries return stale results from the cache when the tables involved in the 
queries are dropped and then recreated with the same name.

The problem can be reproduced by executing the following sequence of queries.
{code:sql}
CREATE TABLE author (fname STRING) STORED AS ORC 
TBLPROPERTIES('transactional'='true');
INSERT INTO author VALUES ('Victor');
SELECT fname FROM author;

DROP TABLE author;

CREATE TABLE author (fname STRING) STORED AS ORC 
TBLPROPERTIES('transactional'='true');
INSERT INTO author VALUES ('Alexander');
SELECT fname FROM author;
{code}
The first execution of the SELECT query correctly returns "Victor" as a result.
The second execution of the SELECT query incorrectly returns "Victor" while it 
should return "Alexander".

The problem manifestates only when the hive.query.results.cache.enabled is set 
to true.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28538) Post JDK11: Local Test Failures for TestGenericUDFFromUnixTimeEvaluate and TestMiniLlapLocalCliDriver.udf_date_format.q

2024-09-26 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884908#comment-17884908
 ] 

Stamatis Zampetakis commented on HIVE-28538:


This means that Hive users that will upgrade to JDK11 may observe behavior 
changes when using the SQL functions outlined. Many thanks for this analysis, 
it will be helpful for people doing upgrades.

> Post JDK11: Local Test Failures for TestGenericUDFFromUnixTimeEvaluate and 
> TestMiniLlapLocalCliDriver.udf_date_format.q
> ---
>
> Key: HIVE-28538
> URL: https://issues.apache.org/jira/browse/HIVE-28538
> Project: Hive
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>Reporter: shivangi
>Assignee: shivangi
>Priority: Major
>
> *Issue:*
> Post JDK11, there are test failures occurring locally in the following test 
> classes:
>  # {{TestGenericUDFFromUnixTimeEvaluate}}
>  # {{TestMiniLlapLocalCliDriver.udf_date_format.q}}
> *Error:*
>  * *from_unixtime(1689930780, -MM-dd HH:mmaa) sessionZone=Etc/GMT, 
> formatter=SIMPLE expected:<2023-07-21 09:13[AM]> but was:<2023-07-21 
> 09:13[am]>*
> *Root Cause Analysis (RCA):*
> In JDK11, changes were made to the Java locale handling as documented in the 
> following issues:
>  * [JDK-8145136|https://bugs.openjdk.org/browse/JDK-8145136]
>  * [JDK-8211985|https://bugs.openjdk.org/browse/JDK-8211985]
> These changes result in different locale behaviors. This issue does not occur 
> in Jenkins because the test cases are written according to the {{en_US}} 
> locale. However, when run locally, the locale may differ (e.g., 
> {{{}en_IN{}}}), causing discrepancies. For instance, {{en_IN}} would expect 
> {{am/pm}} while the tests expect {{{}AM/PM{}}}, leading to failures.
> *Solution:*
> To ensure consistent behavior across both local and Jenkins environments, 
> pass the locale in {{}} to align both environments to 
> {{en_US.}}
> This will ensure that all tests run with the {{en_US}} locale, mitigating 
> locale-related test failures.
>  
> *Additional Notes:*
>  * There may be other test failures not yet captured due to different locales 
> in local environments. Ensure all tests run with the {{en_US}} locale to 
> identify and resolve any further issues.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28538) Post JDK11: Local Test Failures for TestGenericUDFFromUnixTimeEvaluate and TestMiniLlapLocalCliDriver.udf_date_format.q

2024-09-26 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884906#comment-17884906
 ] 

Stamatis Zampetakis commented on HIVE-28538:


This is similar to HIVE-28381. How are these two related?

> Post JDK11: Local Test Failures for TestGenericUDFFromUnixTimeEvaluate and 
> TestMiniLlapLocalCliDriver.udf_date_format.q
> ---
>
> Key: HIVE-28538
> URL: https://issues.apache.org/jira/browse/HIVE-28538
> Project: Hive
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>Reporter: shivangi
>Assignee: shivangi
>Priority: Major
>
> *Issue:*
> Post JDK11, there are test failures occurring locally in the following test 
> classes:
>  # {{TestGenericUDFFromUnixTimeEvaluate}}
>  # {{TestMiniLlapLocalCliDriver.udf_date_format.q}}
> *Error:*
>  * *from_unixtime(1689930780, -MM-dd HH:mmaa) sessionZone=Etc/GMT, 
> formatter=SIMPLE expected:<2023-07-21 09:13[AM]> but was:<2023-07-21 
> 09:13[am]>*
> *Root Cause Analysis (RCA):*
> In JDK11, changes were made to the Java locale handling as documented in the 
> following issues:
>  * [JDK-8145136|https://bugs.openjdk.org/browse/JDK-8145136]
>  * [JDK-8211985|https://bugs.openjdk.org/browse/JDK-8211985]
> These changes result in different locale behaviors. This issue does not occur 
> in Jenkins because the test cases are written according to the {{en_US}} 
> locale. However, when run locally, the locale may differ (e.g., 
> {{{}en_IN{}}}), causing discrepancies. For instance, {{en_IN}} would expect 
> {{am/pm}} while the tests expect {{{}AM/PM{}}}, leading to failures.
> *Solution:*
> To ensure consistent behavior across both local and Jenkins environments, 
> pass the locale in {{}} to align both environments to 
> {{en_US.}}
> This will ensure that all tests run with the {{en_US}} locale, mitigating 
> locale-related test failures.
>  
> *Additional Notes:*
>  * There may be other test failures not yet captured due to different locales 
> in local environments. Ensure all tests run with the {{en_US}} locale to 
> identify and resolve any further issues.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-25351) stddev(), stddev_pop() with CBO enable returning null

2024-09-25 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884538#comment-17884538
 ] 

Stamatis Zampetakis commented on HIVE-25351:


It seems that this problem also affects the respective rule in Calcite 
(CALCITE-6080). If that's the case I would suggest to first tackle the problem 
there and then bring the changes in Hive.

> stddev(), stddev_pop() with CBO enable returning null
> -
>
> Key: HIVE-25351
> URL: https://issues.apache.org/jira/browse/HIVE-25351
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashish Sharma
>Assignee: Jiandan Yang 
>Priority: Blocker
>  Labels: pull-request-available
>
> *script used to repro*
> create table cbo_test (key string, v1 double, v2 decimal(30,2), v3 
> decimal(30,2));
> insert into cbo_test values ("00140006375905", 10230.72, 
> 10230.72, 10230.69), ("00140006375905", 10230.72, 10230.72, 
> 10230.69), ("00140006375905", 10230.72, 10230.72, 10230.69), 
> ("00140006375905", 10230.72, 10230.72, 10230.69), 
> ("00140006375905", 10230.72, 10230.72, 10230.69), 
> ("00140006375905", 10230.72, 10230.72, 10230.69);
> select stddev(v1), stddev(v2), stddev(v3) from cbo_test;
> *Enable CBO*
> ++
> |  Explain   |
> ++
> | Plan optimized by CBO. |
> ||
> | Vertex dependency in root stage|
> | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)|
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Stage-1|
> |   Reducer 2 vectorized |
> |   File Output Operator [FS_13] |
> | Select Operator [SEL_12] (rows=1 width=24) |
> |   Output:["_col0","_col1","_col2"] |
> |   Group By Operator [GBY_11] (rows=1 width=72) |
> | 
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(VALUE._col0)","sum(VALUE._col1)","count(VALUE._col2)","sum(VALUE._col3)","sum(VALUE._col4)","count(VALUE._col5)","sum(VALUE._col6)","sum(VALUE._col7)","count(VALUE._col8)"]
>  |
> |   <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized  |
> | PARTITION_ONLY_SHUFFLE [RS_10] |
> |   Group By Operator [GBY_9] (rows=1 width=72) |
> | 
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(_col3)","sum(_col0)","count(_col0)","sum(_col5)","sum(_col4)","count(_col1)","sum(_col7)","sum(_col6)","count(_col2)"]
>  |
> | Select Operator [SEL_8] (rows=6 width=232) |
> |   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7"] |
> |   TableScan [TS_0] (rows=6 width=232) |
> | default@cbo_test,cbo_test, ACID 
> table,Tbl:COMPLETE,Col:COMPLETE,Output:["v1","v2","v3"] |
> ||
> ++
> *Query Result* 
> _c0   _c1 _c2
> 0.0   NaN NaN
> *Disable CBO*
> ++
> |  Explain   |
> ++
> | Vertex dependency in root stage|
> | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)|
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Stage-1|
> |   Reducer 2 vectorized |
> |   File Output Operator [FS_11] |
> | Group By Operator [GBY_10] (rows=1 width=24) |
> |   
> Output:["_col0","_col1","_col2"],aggregations:["stddev(VALUE._col0)","stddev(VALUE._col1)","stddev(VALUE._col2)"]
>  |
> | <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized|
> |   PARTITION_ONLY_SHUFFLE [RS_9]|
> | Group By Operator [GBY_8] (rows=1 width=240) |
> |   
> Output:["_col0","_col1","_col2"],aggregations:["stddev(v1)","stddev(v2)","stddev(v3)"]
>  |
> |   Select Operator [SEL_7] (rows=6 width=232) |
> | Outp

[jira] [Commented] (HIVE-28014) to_unix_timestamp udf produces inconsistent results in different jdk versions

2024-09-23 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883788#comment-17883788
 ] 

Stamatis Zampetakis commented on HIVE-28014:


I suppose that now that HIVE-28337 is fixed the first failure reported here in 
TestMetastoreUtils should no longer appear.

> to_unix_timestamp udf produces inconsistent results in different jdk versions
> -
>
> Key: HIVE-28014
> URL: https://issues.apache.org/jira/browse/HIVE-28014
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0-beta-1
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>
> In HIVE-27999 we update the CI docker image which upgrades jdk8 from 
> {*}1.8.0_262-b19{*} to *1.8.0_392-b08*. This upgrade cause 3 timestamp 
> related tests failed:
> *1. Testing / split-02 / PostProcess / 
> testTimestampToString[zoneId=Europe/Paris, timestamp=2417-03-26T02:08:43] – 
> org.apache.hadoop.hive.metastore.utils.TestMetaStoreUtils*
> {code:bash}
> Error
> expected:<2417-03-26 0[2]:08:43> but was:<2417-03-26 0[3]:08:43>
> Stacktrace
> org.junit.ComparisonFailure: expected:<2417-03-26 0[2]:08:43> but 
> was:<2417-03-26 0[3]:08:43>
>   at 
> org.apache.hadoop.hive.metastore.utils.TestMetaStoreUtils.testTimestampToString(TestMetaStoreUtils.java:85)
> {code}
> *2. Testing / split-01 / PostProcess / testCliDriver[udf5] – 
> org.apache.hadoop.hive.cli.split24.TestMiniLlapLocalCliDriver*
> {code:bash}
> Error
> Client Execution succeeded but contained differences (error code = 1) after 
> executing udf5.q 
> 263c263
> < 1400-11-08 07:35:34
> ---
> > 1400-11-08 07:35:24
> 272c272
> < 1800-11-08 07:35:34
> ---
> > 1800-11-08 07:35:24
> 434c434
> < 1399-12-31 23:35:34
> ---
> > 1399-12-31 23:35:24
> 443c443
> < 1799-12-31 23:35:34
> ---
> > 1799-12-31 23:35:24
> 452c452
> < 1899-12-31 23:35:34
> ---
> > 1899-12-31 23:35:24
> {code}
> *3. Testing / split-19 / PostProcess / testStringArg2 – 
> org.apache.hadoop.hive.ql.udf.generic.TestGenericUDFToUnixTimestamp*
> {code:bash}
> Stacktrace
> org.junit.ComparisonFailure: expected:<-17984790[40]0> but 
> was:<-17984790[39]0>
>   at org.junit.Assert.assertEquals(Assert.java:117)
>   at org.junit.Assert.assertEquals(Assert.java:146)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.TestGenericUDFToUnixTimestamp.runAndVerify(TestGenericUDFToUnixTimestamp.java:70)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.TestGenericUDFToUnixTimestamp.testStringArg2(TestGenericUDFToUnixTimestamp.java:167)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> {code}
> It maybe a jdk bug and fixed in the new release, because we could get the 
> same result from Spark:
> {code:sql}
> spark-sql> select to_unix_timestamp(to_timestamp("1400-02-01 00:00:00 ICT", 
> "-MM-dd HH:mm:ss z"), "US/Pacific");
> -17984790390
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28337) Process timestamps at UTC timezone instead of local timezone in MetaStoreUtils

2024-09-23 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-28337.

Fix Version/s: 4.1.0
   Resolution: Fixed

Fixed in 
[https://github.com/apache/hive/commit/e31811bb7c6670ab1f725adde3aa2b012ca64415]

Thanks for the PR [~kiranvelumuri] and for the review [~wechar] !

> Process timestamps at UTC timezone instead of local timezone in MetaStoreUtils
> --
>
> Key: HIVE-28337
> URL: https://issues.apache.org/jira/browse/HIVE-28337
> Project: Hive
>  Issue Type: Bug
>Reporter: Kiran Velumuri
>Assignee: Kiran Velumuri
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
> Attachments: image-2024-06-18-12-42-05-646.png, 
> image-2024-06-18-12-42-31-472.png
>
>
> Currently in MetaStoreUtils, the conversion to/from timestamp and string 
> makes use of LocalDateTime in the local time zone while processing 
> timestamps. This causes issue with representing timestamps *as mentioned 
> below*. Instead, while dealing with timestamps it is proposed to use 
> java.time.Instant to represent a point on the time-line, which would overcome 
> the issue with representing such timestamps. Accordingly the test class for 
> MetaStoreUtils (TestMetaStoreUtils) has also been modified to account for 
> these changes.
> +Failing scenario:+
> Timestamps in time-zones which observe daylight savings during which the 
> clock is set forward(typicallly 2:00 AM - 3:00 AM)
> Example: 2417-03-26T02:08:43 in Europe/Paris is invalid, and would get 
> converted to 2417-03-26T03:08:43 by Timestamp.valueOf() method, when instead 
> we want to represent the original timestamp without conversion.
> This is happening due to representing timestamp as LocalDateTime in 
> TestMetaStoreUtils, which is independent of the time-zone of the timestamp. 
> This LocalDateTime timestamp when combined with time-zone is leading to 
> invalid timestamp.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28483) CAST string to date should return null when format is invalid

2024-09-23 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-28483.

Fix Version/s: 4.1.0
   Resolution: Fixed

Fixed in 
[https://github.com/apache/hive/commit/e87b5bb8f4bded30b17e46ee573151488c78d178.]

Thanks for the PR [~zratkai] !

The new behavior was also discussed in the mailing lists: 
https://lists.apache.org/thread/blo8ozrhmh1jq9c0oz8bhm39lpb95bbv

> CAST string to date should return null when format is invalid
> -
>
> Key: HIVE-28483
> URL: https://issues.apache.org/jira/browse/HIVE-28483
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Rátkai
>Assignee: Zoltán Rátkai
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> Date conversation gives wrong result. Like:1 row selected (6.403 seconds)
> select to_date('03-08-2024');
> Result:
> +-+
> |    _c0    |
> +-+
> |0003-08-20  |
> +-+
> or:
> select to_date(last_day(add_months(last_day('03-08-2024'), -1))) ;
> Result:
> +-+
> |    _c0    |
> +-+
> |0003-07-31  |
> +-
> Here is my comparison with other database systems:
> +
> --
> PostgreSQL
> --
> SELECT TO_DATE('03-08-2024','MMDD');
> invalid value "03-0" for "" DETAIL: Field requires 4 characters, but only 
> 2 could be parsed. HINT: If your source string is not fixed-width, try using 
> the "FM" modifier. 
> SELECT TO_DATE('03-08-2024','DD-MM-');
> to_date
> Sat, 03 Aug 2024 00:00:00 GMT
> SELECT CAST('03-08-2024' AS date);
> date
> Fri, 08 Mar 2024 00:00:00 GMT
> SELECT CAST('2024-08-03' AS date);
> date
> Sat, 03 Aug 2024 00:00:00 GMT
> SELECT CAST('2024-08-03 T' AS date);
> invalid input syntax for type date: "2024-08-03 T" LINE 1: SELECT 
> CAST('2024-08-03 T' AS date) ^ 
> SELECT CAST('2024-08-03T' AS date);
> invalid input syntax for type date: "2024-08-03T" LINE 1: SELECT 
> CAST('2024-08-03T' AS date) ^ 
> SELECT CAST('2024-08-03T12:00:00' AS date);
> date
> Sat, 03 Aug 2024 00:00:00 GMT
> SELECT CAST('2024-08-0312:00:00' AS date);
> date/time field value out of range: "2024-08-0312:00:00" LINE 1: SELECT 
> CAST('2024-08-0312:00:00' AS date) ^ HINT: Perhaps you need a different 
> "datestyle" setting. 
> --
> -ORACLE---
> --
> select CAST('2024-08-03 12:00:00' AS date) from dual;
> Output:
> select CAST('2024-08-03 12:00:00' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01861: literal does not match format string
> -
> select CAST('2024-08-03' AS date) from dual;
> Output:
> select CAST('2024-08-03' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01861: literal does not match format string
> -
> SELECT TO_DATE('08/03/2024', 'MM/DD/') FROM DUAL;
> Output:
> TO_DATE('
> -
> 03-AUG-24
> -
> SELECT TO_DATE('2024-08-03', '-MM-DD') FROM DUAL;
> Output:
> TO_DATE('
> -
> 03-AUG-24
> -
> select CAST('03-08-2024' AS date) from dual;
> Output:
> select CAST('03-08-2024' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01843: An invalid month was specified.
> -
> select CAST('2024-08-0312:00:00' AS date) from dual;
> Output:
> select CAST('2024-08-0312:00:00' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01861: literal does not match format string
> -
> select CAST('10-AUG-24' AS date) from dual;
> Output:
> CAST('10-
> -
> 10-AUG-24
> -
> select CAST('10-AUG-2024' AS date) from dual;
> Output:
> CAST('10-
> -
> 10-AUG-24
> -
> select CAST('03-08-24' AS date) from dual;
> Output:
> select CAST('03-08-24' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01843: An invalid month was specified.
>  
> --
> select CAST('03-08-2024' AS date) from dual;
> Output:
> select CAST('03-08-2024' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01843: An invalid month was specified.
> 
> SELECT sysdate FROM DUAL;
> Output:
> SYSDATE
> -
> 10-SEP-24

[jira] [Updated] (HIVE-28483) CAST string to date should return null when format is invalid

2024-09-23 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-28483:
---
Summary: CAST string to date should return null when format is invalid  
(was: String date cast giving wrong result)

> CAST string to date should return null when format is invalid
> -
>
> Key: HIVE-28483
> URL: https://issues.apache.org/jira/browse/HIVE-28483
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Rátkai
>Assignee: Zoltán Rátkai
>Priority: Minor
>  Labels: pull-request-available
>
> Date conversation gives wrong result. Like:1 row selected (6.403 seconds)
> select to_date('03-08-2024');
> Result:
> +-+
> |    _c0    |
> +-+
> |0003-08-20  |
> +-+
> or:
> select to_date(last_day(add_months(last_day('03-08-2024'), -1))) ;
> Result:
> +-+
> |    _c0    |
> +-+
> |0003-07-31  |
> +-
> Here is my comparison with other database systems:
> +
> --
> PostgreSQL
> --
> SELECT TO_DATE('03-08-2024','MMDD');
> invalid value "03-0" for "" DETAIL: Field requires 4 characters, but only 
> 2 could be parsed. HINT: If your source string is not fixed-width, try using 
> the "FM" modifier. 
> SELECT TO_DATE('03-08-2024','DD-MM-');
> to_date
> Sat, 03 Aug 2024 00:00:00 GMT
> SELECT CAST('03-08-2024' AS date);
> date
> Fri, 08 Mar 2024 00:00:00 GMT
> SELECT CAST('2024-08-03' AS date);
> date
> Sat, 03 Aug 2024 00:00:00 GMT
> SELECT CAST('2024-08-03 T' AS date);
> invalid input syntax for type date: "2024-08-03 T" LINE 1: SELECT 
> CAST('2024-08-03 T' AS date) ^ 
> SELECT CAST('2024-08-03T' AS date);
> invalid input syntax for type date: "2024-08-03T" LINE 1: SELECT 
> CAST('2024-08-03T' AS date) ^ 
> SELECT CAST('2024-08-03T12:00:00' AS date);
> date
> Sat, 03 Aug 2024 00:00:00 GMT
> SELECT CAST('2024-08-0312:00:00' AS date);
> date/time field value out of range: "2024-08-0312:00:00" LINE 1: SELECT 
> CAST('2024-08-0312:00:00' AS date) ^ HINT: Perhaps you need a different 
> "datestyle" setting. 
> --
> -ORACLE---
> --
> select CAST('2024-08-03 12:00:00' AS date) from dual;
> Output:
> select CAST('2024-08-03 12:00:00' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01861: literal does not match format string
> -
> select CAST('2024-08-03' AS date) from dual;
> Output:
> select CAST('2024-08-03' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01861: literal does not match format string
> -
> SELECT TO_DATE('08/03/2024', 'MM/DD/') FROM DUAL;
> Output:
> TO_DATE('
> -
> 03-AUG-24
> -
> SELECT TO_DATE('2024-08-03', '-MM-DD') FROM DUAL;
> Output:
> TO_DATE('
> -
> 03-AUG-24
> -
> select CAST('03-08-2024' AS date) from dual;
> Output:
> select CAST('03-08-2024' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01843: An invalid month was specified.
> -
> select CAST('2024-08-0312:00:00' AS date) from dual;
> Output:
> select CAST('2024-08-0312:00:00' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01861: literal does not match format string
> -
> select CAST('10-AUG-24' AS date) from dual;
> Output:
> CAST('10-
> -
> 10-AUG-24
> -
> select CAST('10-AUG-2024' AS date) from dual;
> Output:
> CAST('10-
> -
> 10-AUG-24
> -
> select CAST('03-08-24' AS date) from dual;
> Output:
> select CAST('03-08-24' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01843: An invalid month was specified.
>  
> --
> select CAST('03-08-2024' AS date) from dual;
> Output:
> select CAST('03-08-2024' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01843: An invalid month was specified.
> 
> SELECT sysdate FROM DUAL;
> Output:
> SYSDATE
> -
> 10-SEP-24
> SYSDATE
> -
> 10-SEP-24
> --
> -MYSQL-

[jira] [Commented] (HIVE-28483) String date cast giving wrong result

2024-09-19 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882943#comment-17882943
 ] 

Stamatis Zampetakis commented on HIVE-28483:


For the behavior, of the CAST function across Hive versions, I added some more 
detailed tests in HIVE-27586.

> String date cast giving wrong result
> 
>
> Key: HIVE-28483
> URL: https://issues.apache.org/jira/browse/HIVE-28483
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Rátkai
>Assignee: Zoltán Rátkai
>Priority: Minor
>  Labels: pull-request-available
>
> Date conversation gives wrong result. Like:1 row selected (6.403 seconds)
> select to_date('03-08-2024');
> Result:
> +-+
> |    _c0    |
> +-+
> |0003-08-20  |
> +-+
> or:
> select to_date(last_day(add_months(last_day('03-08-2024'), -1))) ;
> Result:
> +-+
> |    _c0    |
> +-+
> |0003-07-31  |
> +-
> Here is my comparison with other database systems:
> +
> --
> PostgreSQL
> --
> SELECT TO_DATE('03-08-2024','MMDD');
> invalid value "03-0" for "" DETAIL: Field requires 4 characters, but only 
> 2 could be parsed. HINT: If your source string is not fixed-width, try using 
> the "FM" modifier. 
> SELECT TO_DATE('03-08-2024','DD-MM-');
> to_date
> Sat, 03 Aug 2024 00:00:00 GMT
> SELECT CAST('03-08-2024' AS date);
> date
> Fri, 08 Mar 2024 00:00:00 GMT
> SELECT CAST('2024-08-03' AS date);
> date
> Sat, 03 Aug 2024 00:00:00 GMT
> SELECT CAST('2024-08-03 T' AS date);
> invalid input syntax for type date: "2024-08-03 T" LINE 1: SELECT 
> CAST('2024-08-03 T' AS date) ^ 
> SELECT CAST('2024-08-03T' AS date);
> invalid input syntax for type date: "2024-08-03T" LINE 1: SELECT 
> CAST('2024-08-03T' AS date) ^ 
> SELECT CAST('2024-08-03T12:00:00' AS date);
> date
> Sat, 03 Aug 2024 00:00:00 GMT
> SELECT CAST('2024-08-0312:00:00' AS date);
> date/time field value out of range: "2024-08-0312:00:00" LINE 1: SELECT 
> CAST('2024-08-0312:00:00' AS date) ^ HINT: Perhaps you need a different 
> "datestyle" setting. 
> --
> -ORACLE---
> --
> select CAST('2024-08-03 12:00:00' AS date) from dual;
> Output:
> select CAST('2024-08-03 12:00:00' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01861: literal does not match format string
> -
> select CAST('2024-08-03' AS date) from dual;
> Output:
> select CAST('2024-08-03' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01861: literal does not match format string
> -
> SELECT TO_DATE('08/03/2024', 'MM/DD/') FROM DUAL;
> Output:
> TO_DATE('
> -
> 03-AUG-24
> -
> SELECT TO_DATE('2024-08-03', '-MM-DD') FROM DUAL;
> Output:
> TO_DATE('
> -
> 03-AUG-24
> -
> select CAST('03-08-2024' AS date) from dual;
> Output:
> select CAST('03-08-2024' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01843: An invalid month was specified.
> -
> select CAST('2024-08-0312:00:00' AS date) from dual;
> Output:
> select CAST('2024-08-0312:00:00' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01861: literal does not match format string
> -
> select CAST('10-AUG-24' AS date) from dual;
> Output:
> CAST('10-
> -
> 10-AUG-24
> -
> select CAST('10-AUG-2024' AS date) from dual;
> Output:
> CAST('10-
> -
> 10-AUG-24
> -
> select CAST('03-08-24' AS date) from dual;
> Output:
> select CAST('03-08-24' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01843: An invalid month was specified.
>  
> --
> select CAST('03-08-2024' AS date) from dual;
> Output:
> select CAST('03-08-2024' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01843: An invalid month was specified.
> 
> SELECT sysdate FROM DUAL;
> Output:
> SYSDATE
> -
> 10-SEP-24
> SYSDATE
> -
> 10-SEP-24
> --
> -MYSQL

[jira] [Commented] (HIVE-27586) Parse dates from strings ignoring trailing (potentialy) invalid chars

2024-09-19 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882942#comment-17882942
 ] 

Stamatis Zampetakis commented on HIVE-27586:


In the light of HIVE-28483, I performed a series of tests to document the 
behavior of parsing dates from strings across some major Hive versions. Date 
parsing appears in various places and may differ slightly across SQL functions 
so in the tests that follow I only examined the results of the CAST (V AS DATE) 
which is probably the most popular way of performing string to date 
conversions. For various SQL functions, the behavior of the vectorized and 
non-vectorized implementation is not aligned so in the tests I included both 
variants.

 !cast_string_date_hive_versions.svg! 

The tests were performed using the script in  [^cast_as_date.q] file and were 
run using the following command.

{noformat}
mvn test -Dtest=TestCliDriver -Dqfile=cast_as_date.q -Phadoop-2  
-Dtest.output.overwrite
{noformat}

Note that hadoop-2 profile is necessary for building older versions of Hive.

> Parse dates from strings ignoring trailing (potentialy) invalid chars
> -
>
> Key: HIVE-27586
> URL: https://issues.apache.org/jira/browse/HIVE-27586
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available
> Fix For: 4.0.0
>
> Attachments: cast_as_date.q, cast_string_date_hive_versions.pdf, 
> cast_string_date_hive_versions.png, cast_string_date_hive_versions.svg
>
>
> The goal of this ticket is to extract and return a valid date from a string 
> value when there is a valid date prefix in the string.
> The following table contains a few illustrative examples highlighting what 
> happens now and what will happen after the proposed changes to ignore 
> trailing characters. HIVE-20007 introduced some behavior changes around this 
> area so the table also displays what was the Hive behavior before that change.
> ||ID||String value||Before HIVE-20007||Current behavior||Ignore trailing 
> chars||
> |1|2023-08-03_16:02:00|2023-08-03|null|2023-08-03|
> |2|2023-08-03-16:02:00|2023-08-03|null|2023-08-03|
> |3|2023-08-0316:02:00|2024-06-11|null|2023-08-03|
> |4|03-08-2023|0009-02-12|null|0003-08-20|
> |5|2023-08-03 GARBAGE|2023-08-03|2023-08-03|2023-08-03|
> |6|2023-08-03TGARBAGE|2023-08-03|2023-08-03|2023-08-03|
> |7|2023-08-03_GARBAGE|2023-08-03|null|2023-08-03|
> This change partially (see example 3 and 4) restores the behavior changes 
> introduced by HIVE-20007 and at the same time makes the current behavior of 
> handling trailing invalid chars more uniform. 
> This change will have an impact on various Hive SQL functions and operators 
> (+/-) that accept dates from string values. A partial list of affected 
> functions is outlined below:
> * CAST (V AS DATE)
> * CAST (V AS TIMESTAMP)
> * TO_DATE
> * DATE_ADD
> * DATE_DIFF
> * WEEKOFYEAR
> * DAYOFWEEK
> * TRUNC



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27586) Parse dates from strings ignoring trailing (potentialy) invalid chars

2024-09-19 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-27586:
---
Attachment: cast_string_date_hive_versions.svg

> Parse dates from strings ignoring trailing (potentialy) invalid chars
> -
>
> Key: HIVE-27586
> URL: https://issues.apache.org/jira/browse/HIVE-27586
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available
> Fix For: 4.0.0
>
> Attachments: cast_string_date_hive_versions.pdf, 
> cast_string_date_hive_versions.svg
>
>
> The goal of this ticket is to extract and return a valid date from a string 
> value when there is a valid date prefix in the string.
> The following table contains a few illustrative examples highlighting what 
> happens now and what will happen after the proposed changes to ignore 
> trailing characters. HIVE-20007 introduced some behavior changes around this 
> area so the table also displays what was the Hive behavior before that change.
> ||ID||String value||Before HIVE-20007||Current behavior||Ignore trailing 
> chars||
> |1|2023-08-03_16:02:00|2023-08-03|null|2023-08-03|
> |2|2023-08-03-16:02:00|2023-08-03|null|2023-08-03|
> |3|2023-08-0316:02:00|2024-06-11|null|2023-08-03|
> |4|03-08-2023|0009-02-12|null|0003-08-20|
> |5|2023-08-03 GARBAGE|2023-08-03|2023-08-03|2023-08-03|
> |6|2023-08-03TGARBAGE|2023-08-03|2023-08-03|2023-08-03|
> |7|2023-08-03_GARBAGE|2023-08-03|null|2023-08-03|
> This change partially (see example 3 and 4) restores the behavior changes 
> introduced by HIVE-20007 and at the same time makes the current behavior of 
> handling trailing invalid chars more uniform. 
> This change will have an impact on various Hive SQL functions and operators 
> (+/-) that accept dates from string values. A partial list of affected 
> functions is outlined below:
> * CAST (V AS DATE)
> * CAST (V AS TIMESTAMP)
> * TO_DATE
> * DATE_ADD
> * DATE_DIFF
> * WEEKOFYEAR
> * DAYOFWEEK
> * TRUNC



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27586) Parse dates from strings ignoring trailing (potentialy) invalid chars

2024-09-19 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-27586:
---
Attachment: cast_as_date.q

> Parse dates from strings ignoring trailing (potentialy) invalid chars
> -
>
> Key: HIVE-27586
> URL: https://issues.apache.org/jira/browse/HIVE-27586
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available
> Fix For: 4.0.0
>
> Attachments: cast_as_date.q, cast_string_date_hive_versions.pdf, 
> cast_string_date_hive_versions.png, cast_string_date_hive_versions.svg
>
>
> The goal of this ticket is to extract and return a valid date from a string 
> value when there is a valid date prefix in the string.
> The following table contains a few illustrative examples highlighting what 
> happens now and what will happen after the proposed changes to ignore 
> trailing characters. HIVE-20007 introduced some behavior changes around this 
> area so the table also displays what was the Hive behavior before that change.
> ||ID||String value||Before HIVE-20007||Current behavior||Ignore trailing 
> chars||
> |1|2023-08-03_16:02:00|2023-08-03|null|2023-08-03|
> |2|2023-08-03-16:02:00|2023-08-03|null|2023-08-03|
> |3|2023-08-0316:02:00|2024-06-11|null|2023-08-03|
> |4|03-08-2023|0009-02-12|null|0003-08-20|
> |5|2023-08-03 GARBAGE|2023-08-03|2023-08-03|2023-08-03|
> |6|2023-08-03TGARBAGE|2023-08-03|2023-08-03|2023-08-03|
> |7|2023-08-03_GARBAGE|2023-08-03|null|2023-08-03|
> This change partially (see example 3 and 4) restores the behavior changes 
> introduced by HIVE-20007 and at the same time makes the current behavior of 
> handling trailing invalid chars more uniform. 
> This change will have an impact on various Hive SQL functions and operators 
> (+/-) that accept dates from string values. A partial list of affected 
> functions is outlined below:
> * CAST (V AS DATE)
> * CAST (V AS TIMESTAMP)
> * TO_DATE
> * DATE_ADD
> * DATE_DIFF
> * WEEKOFYEAR
> * DAYOFWEEK
> * TRUNC



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27586) Parse dates from strings ignoring trailing (potentialy) invalid chars

2024-09-19 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-27586:
---
Attachment: cast_string_date_hive_versions.png

> Parse dates from strings ignoring trailing (potentialy) invalid chars
> -
>
> Key: HIVE-27586
> URL: https://issues.apache.org/jira/browse/HIVE-27586
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available
> Fix For: 4.0.0
>
> Attachments: cast_string_date_hive_versions.pdf, 
> cast_string_date_hive_versions.png, cast_string_date_hive_versions.svg
>
>
> The goal of this ticket is to extract and return a valid date from a string 
> value when there is a valid date prefix in the string.
> The following table contains a few illustrative examples highlighting what 
> happens now and what will happen after the proposed changes to ignore 
> trailing characters. HIVE-20007 introduced some behavior changes around this 
> area so the table also displays what was the Hive behavior before that change.
> ||ID||String value||Before HIVE-20007||Current behavior||Ignore trailing 
> chars||
> |1|2023-08-03_16:02:00|2023-08-03|null|2023-08-03|
> |2|2023-08-03-16:02:00|2023-08-03|null|2023-08-03|
> |3|2023-08-0316:02:00|2024-06-11|null|2023-08-03|
> |4|03-08-2023|0009-02-12|null|0003-08-20|
> |5|2023-08-03 GARBAGE|2023-08-03|2023-08-03|2023-08-03|
> |6|2023-08-03TGARBAGE|2023-08-03|2023-08-03|2023-08-03|
> |7|2023-08-03_GARBAGE|2023-08-03|null|2023-08-03|
> This change partially (see example 3 and 4) restores the behavior changes 
> introduced by HIVE-20007 and at the same time makes the current behavior of 
> handling trailing invalid chars more uniform. 
> This change will have an impact on various Hive SQL functions and operators 
> (+/-) that accept dates from string values. A partial list of affected 
> functions is outlined below:
> * CAST (V AS DATE)
> * CAST (V AS TIMESTAMP)
> * TO_DATE
> * DATE_ADD
> * DATE_DIFF
> * WEEKOFYEAR
> * DAYOFWEEK
> * TRUNC



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27586) Parse dates from strings ignoring trailing (potentialy) invalid chars

2024-09-19 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-27586:
---
Attachment: cast_string_date_hive_versions.pdf

> Parse dates from strings ignoring trailing (potentialy) invalid chars
> -
>
> Key: HIVE-27586
> URL: https://issues.apache.org/jira/browse/HIVE-27586
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available
> Fix For: 4.0.0
>
> Attachments: cast_string_date_hive_versions.pdf
>
>
> The goal of this ticket is to extract and return a valid date from a string 
> value when there is a valid date prefix in the string.
> The following table contains a few illustrative examples highlighting what 
> happens now and what will happen after the proposed changes to ignore 
> trailing characters. HIVE-20007 introduced some behavior changes around this 
> area so the table also displays what was the Hive behavior before that change.
> ||ID||String value||Before HIVE-20007||Current behavior||Ignore trailing 
> chars||
> |1|2023-08-03_16:02:00|2023-08-03|null|2023-08-03|
> |2|2023-08-03-16:02:00|2023-08-03|null|2023-08-03|
> |3|2023-08-0316:02:00|2024-06-11|null|2023-08-03|
> |4|03-08-2023|0009-02-12|null|0003-08-20|
> |5|2023-08-03 GARBAGE|2023-08-03|2023-08-03|2023-08-03|
> |6|2023-08-03TGARBAGE|2023-08-03|2023-08-03|2023-08-03|
> |7|2023-08-03_GARBAGE|2023-08-03|null|2023-08-03|
> This change partially (see example 3 and 4) restores the behavior changes 
> introduced by HIVE-20007 and at the same time makes the current behavior of 
> handling trailing invalid chars more uniform. 
> This change will have an impact on various Hive SQL functions and operators 
> (+/-) that accept dates from string values. A partial list of affected 
> functions is outlined below:
> * CAST (V AS DATE)
> * CAST (V AS TIMESTAMP)
> * TO_DATE
> * DATE_ADD
> * DATE_DIFF
> * WEEKOFYEAR
> * DAYOFWEEK
> * TRUNC



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-28519) Upgrade Maven SureFire Plugin to latest version 3.5.0

2024-09-11 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881049#comment-17881049
 ] 

Stamatis Zampetakis edited comment on HIVE-28519 at 9/11/24 4:47 PM:
-

Upgrading Surefire is far from trivial. I am working on it as part of 
HIVE-26332. 

[~Indhumathi27] If you want to take over I can leave it to you but I would 
suggest you check HIVE-26332 and the history behind it.


was (Author: zabetak):
Upgrading Surefire is far from trivial. I am working on it as part of 
HIVE-26332. 

[~Indhumathi27] If you want to take over I can leave it to you but I would 
suggest you check HIVE-26332 and the history behind i.t

> Upgrade Maven SureFire Plugin to latest version 3.5.0
> -
>
> Key: HIVE-28519
> URL: https://issues.apache.org/jira/browse/HIVE-28519
> Project: Hive
>  Issue Type: Improvement
>  Security Level: Public(Viewable by anyone) 
>Reporter: Indhumathi Muthumurugesh
>Assignee: Indhumathi Muthumurugesh
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28519) Upgrade Maven SureFire Plugin to latest version 3.5.0

2024-09-11 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881049#comment-17881049
 ] 

Stamatis Zampetakis commented on HIVE-28519:


Upgrading Surefire is far from trivial. I am working on it as part of 
HIVE-26332. 

[~Indhumathi27] If you want to take over I can leave it to you but I would 
suggest you check HIVE-26332 and the history behind i.t

> Upgrade Maven SureFire Plugin to latest version 3.5.0
> -
>
> Key: HIVE-28519
> URL: https://issues.apache.org/jira/browse/HIVE-28519
> Project: Hive
>  Issue Type: Improvement
>  Security Level: Public(Viewable by anyone) 
>Reporter: Indhumathi Muthumurugesh
>Assignee: Indhumathi Muthumurugesh
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28499) Intellij Idea2024 can't import iceberg&hive checkstyle files

2024-09-11 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880889#comment-17880889
 ] 

Stamatis Zampetakis commented on HIVE-28499:


{noformat}
find . -name "checkstyle.xml"
./standalone-metastore/checkstyle/checkstyle.xml
./checkstyle/checkstyle.xml
./iceberg/checkstyle/checkstyle.xml
./hcatalog/build-support/ant/checkstyle.xml
./storage-api/checkstyle/checkstyle.xml
{noformat}


> Intellij Idea2024 can't import iceberg&hive checkstyle files
> 
>
> Key: HIVE-28499
> URL: https://issues.apache.org/jira/browse/HIVE-28499
> Project: Hive
>  Issue Type: Improvement
>Reporter: Butao Zhang
>Priority: Major
> Attachments: idea2024-checkstyle-8.28.jpg, 
> idea2024-checkstyle-tool-error.jpg, 
> idea2024-hive-checkstyle-8.28versiion-failed.jpg, 
> idea2024-import-iceberg-checkstyle_1.jpg, 
> idea2024-import-iceberg-checkstyle_2.jpg, 
> idea2024-import-iceberg-checkstyle_error_3.jpg, 
> idea2024-with-checkstyle-plugin-5.86.0.jpg, install_checkstyle_plugin.jpg
>
>
> I upgraded my Intellij from 2022 to 2024 version, and i found that i can't 
> import iceberg&hive checkstyle files.
> {code:java}
> IntelliJ IDEA 2024.2.1 (Ultimate Edition)
> Build #IU-242.21829.142, built on August 28, 2024{code}
> Here are some screen shot & steps of my Intellij 2024:
> 1. Install CheckStyle-IDEA plugin
> !install_checkstyle_plugin.jpg!
> 2. import hive-iceberg checkstyle files using Code Style setting
> !idea2024-import-iceberg-checkstyle_1.jpg!
>  
> import this file 
> [https://github.com/apache/hive/blob/master/iceberg/checkstyle/checkstyle.xml]
> !idea2024-import-iceberg-checkstyle_2.jpg!
>  
> 3. import checkstyle failed
> !idea2024-import-iceberg-checkstyle_error_3.jpg!
>  
> 4. Checkstyle tool also failed
> {code:java}
> com.puppycrawl.tools.checkstyle.api.CheckstyleException: 
> SuppressWithNearbyCommentFilter is not allowed as a child in Checker
>     at com.puppycrawl.tools.checkstyle.Checker.setupChild(Checker.java:501)
>     at 
> com.puppycrawl.tools.checkstyle.api.AutomaticBean.configure(AutomaticBean.java:201)
>     at 
> org.infernus.idea.checkstyle.service.cmd.OpCreateChecker.execute(OpCreateChecker.java:61)
>     at 
> org.infernus.idea.checkstyle.service.cmd.OpCreateChecker.execute(OpCreateChecker.java:26)
>     at 
> org.infernus.idea.checkstyle.service.CheckstyleActionsImpl.executeCommand(CheckstyleActionsImpl.java:116)
>     at 
> org.infernus.idea.checkstyle.service.CheckstyleActionsImpl.createChecker(CheckstyleActionsImpl.java:60)
>     at 
> org.infernus.idea.checkstyle.service.CheckstyleActionsImpl.createChecker(CheckstyleActionsImpl.java:51)
>     at 
> org.infernus.idea.checkstyle.checker.CheckerFactoryWorker.run(CheckerFactoryWorker.java:42)
>  {code}
> !idea2024-checkstyle-tool-error.jpg!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-24167) TPC-DS query 14 fails while generating plan for the filter

2024-09-09 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880320#comment-17880320
 ] 

Stamatis Zampetakis commented on HIVE-24167:


If the failures in PlanMapper:
 * cause plan changes then existing tests should be able to capture this since 
the diff with the golden files will fail
 * do not cause plan changes then addressing the failures may not be worth the 
effort

thus I feel that we don't necessarily need a toggle but I don't feel strongly 
about it. Feel free to choose whatever you think its best.

Skipping the failures is very different from what has been done in the previous 
PRs so I think it is better to put it in a new PR instead of updating the 
existing ones.

> TPC-DS query 14 fails while generating plan for the filter
> --
>
> Key: HIVE-24167
> URL: https://issues.apache.org/jira/browse/HIVE-24167
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: hive-4.1.0-must, pull-request-available
>
> TPC-DS query 14 (cbo_query14.q and query4.q) fail with NPE on the metastore 
> with the partitioned TPC-DS 30TB dataset while generating the plan for the 
> filter.
> The problem can be reproduced using the PR in HIVE-23965.
> The current stacktrace shows that the NPE appears while trying to display the 
> debug message but even if this line didn't exist it would fail again later on.
> {noformat}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10867)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11765)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11622)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11649)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11622)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11649)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11635)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlanForSubQueryPredicate(SemanticAnalyzer.java:3375)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:3473)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10819)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11765)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11622)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11625)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11625)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11649)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11622)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11649)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11635)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12417)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:718)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12519)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:443)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:171)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220)
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:173)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:414)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:363)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:357)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:129)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(R

[jira] [Commented] (HIVE-28499) Intellij Idea2024 can't import iceberg&hive checkstyle files

2024-09-05 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879482#comment-17879482
 ] 

Stamatis Zampetakis commented on HIVE-28499:


Are the two files so different? If not then maybe worth a try to replace the 
iceberg one with the older. 

>From now on going forward, I think we should keep only one style. This way at 
>least the new code will be correctly formatted. Having multiple styles for the 
>same project is very strange.

> Intellij Idea2024 can't import iceberg&hive checkstyle files
> 
>
> Key: HIVE-28499
> URL: https://issues.apache.org/jira/browse/HIVE-28499
> Project: Hive
>  Issue Type: Improvement
>Reporter: Butao Zhang
>Priority: Major
> Attachments: idea2024-checkstyle-8.28.jpg, 
> idea2024-checkstyle-tool-error.jpg, 
> idea2024-hive-checkstyle-8.28versiion-failed.jpg, 
> idea2024-import-iceberg-checkstyle_1.jpg, 
> idea2024-import-iceberg-checkstyle_2.jpg, 
> idea2024-import-iceberg-checkstyle_error_3.jpg, 
> idea2024-with-checkstyle-plugin-5.86.0.jpg, install_checkstyle_plugin.jpg
>
>
> I upgraded my Intellij from 2022 to 2024 version, and i found that i can't 
> import iceberg&hive checkstyle files.
> {code:java}
> IntelliJ IDEA 2024.2.1 (Ultimate Edition)
> Build #IU-242.21829.142, built on August 28, 2024{code}
> Here are some screen shot & steps of my Intellij 2024:
> 1. Install CheckStyle-IDEA plugin
> !install_checkstyle_plugin.jpg!
> 2. import hive-iceberg checkstyle files using Code Style setting
> !idea2024-import-iceberg-checkstyle_1.jpg!
>  
> import this file 
> [https://github.com/apache/hive/blob/master/iceberg/checkstyle/checkstyle.xml]
> !idea2024-import-iceberg-checkstyle_2.jpg!
>  
> 3. import checkstyle failed
> !idea2024-import-iceberg-checkstyle_error_3.jpg!
>  
> 4. Checkstyle tool also failed
> {code:java}
> com.puppycrawl.tools.checkstyle.api.CheckstyleException: 
> SuppressWithNearbyCommentFilter is not allowed as a child in Checker
>     at com.puppycrawl.tools.checkstyle.Checker.setupChild(Checker.java:501)
>     at 
> com.puppycrawl.tools.checkstyle.api.AutomaticBean.configure(AutomaticBean.java:201)
>     at 
> org.infernus.idea.checkstyle.service.cmd.OpCreateChecker.execute(OpCreateChecker.java:61)
>     at 
> org.infernus.idea.checkstyle.service.cmd.OpCreateChecker.execute(OpCreateChecker.java:26)
>     at 
> org.infernus.idea.checkstyle.service.CheckstyleActionsImpl.executeCommand(CheckstyleActionsImpl.java:116)
>     at 
> org.infernus.idea.checkstyle.service.CheckstyleActionsImpl.createChecker(CheckstyleActionsImpl.java:60)
>     at 
> org.infernus.idea.checkstyle.service.CheckstyleActionsImpl.createChecker(CheckstyleActionsImpl.java:51)
>     at 
> org.infernus.idea.checkstyle.checker.CheckerFactoryWorker.run(CheckerFactoryWorker.java:42)
>  {code}
> !idea2024-checkstyle-tool-error.jpg!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28499) Intellij Idea2024 can't import iceberg&hive checkstyle files

2024-09-04 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879233#comment-17879233
 ] 

Stamatis Zampetakis commented on HIVE-28499:


The checkstyle configuration for the whole project should be the same. I don't 
know how we ended up with multiple checkstyle files. If we can unify the 
configurations with minimal changes to the source code that would be ideal.

> Intellij Idea2024 can't import iceberg&hive checkstyle files
> 
>
> Key: HIVE-28499
> URL: https://issues.apache.org/jira/browse/HIVE-28499
> Project: Hive
>  Issue Type: Improvement
>Reporter: Butao Zhang
>Priority: Major
> Attachments: idea2024-checkstyle-8.28.jpg, 
> idea2024-checkstyle-tool-error.jpg, 
> idea2024-hive-checkstyle-8.28versiion-failed.jpg, 
> idea2024-import-iceberg-checkstyle_1.jpg, 
> idea2024-import-iceberg-checkstyle_2.jpg, 
> idea2024-import-iceberg-checkstyle_error_3.jpg, 
> idea2024-with-checkstyle-plugin-5.86.0.jpg, install_checkstyle_plugin.jpg
>
>
> I upgraded my Intellij from 2022 to 2024 version, and i found that i can't 
> import iceberg&hive checkstyle files.
> {code:java}
> IntelliJ IDEA 2024.2.1 (Ultimate Edition)
> Build #IU-242.21829.142, built on August 28, 2024{code}
> Here are some screen shot & steps of my Intellij 2024:
> 1. Install CheckStyle-IDEA plugin
> !install_checkstyle_plugin.jpg!
> 2. import hive-iceberg checkstyle files using Code Style setting
> !idea2024-import-iceberg-checkstyle_1.jpg!
>  
> import this file 
> [https://github.com/apache/hive/blob/master/iceberg/checkstyle/checkstyle.xml]
> !idea2024-import-iceberg-checkstyle_2.jpg!
>  
> 3. import checkstyle failed
> !idea2024-import-iceberg-checkstyle_error_3.jpg!
>  
> 4. Checkstyle tool also failed
> {code:java}
> com.puppycrawl.tools.checkstyle.api.CheckstyleException: 
> SuppressWithNearbyCommentFilter is not allowed as a child in Checker
>     at com.puppycrawl.tools.checkstyle.Checker.setupChild(Checker.java:501)
>     at 
> com.puppycrawl.tools.checkstyle.api.AutomaticBean.configure(AutomaticBean.java:201)
>     at 
> org.infernus.idea.checkstyle.service.cmd.OpCreateChecker.execute(OpCreateChecker.java:61)
>     at 
> org.infernus.idea.checkstyle.service.cmd.OpCreateChecker.execute(OpCreateChecker.java:26)
>     at 
> org.infernus.idea.checkstyle.service.CheckstyleActionsImpl.executeCommand(CheckstyleActionsImpl.java:116)
>     at 
> org.infernus.idea.checkstyle.service.CheckstyleActionsImpl.createChecker(CheckstyleActionsImpl.java:60)
>     at 
> org.infernus.idea.checkstyle.service.CheckstyleActionsImpl.createChecker(CheckstyleActionsImpl.java:51)
>     at 
> org.infernus.idea.checkstyle.checker.CheckerFactoryWorker.run(CheckerFactoryWorker.java:42)
>  {code}
> !idea2024-checkstyle-tool-error.jpg!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28494) Iceberg: mvn build enables iceberg module by default

2024-09-03 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878881#comment-17878881
 ] 

Stamatis Zampetakis commented on HIVE-28494:


[~zhangbutao] Isn't this a duplicate of HIVE-25998?

> Iceberg: mvn build enables iceberg module by default
> 
>
> Key: HIVE-28494
> URL: https://issues.apache.org/jira/browse/HIVE-28494
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Major
>  Labels: pull-request-available
>
> HIVE-25027 hidden the iceberg module by default. IMO, we have put lots of 
> effort into iceberg module and it is more stable than before. We should 
> enable the iceberg module by default in case of mvn build.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26987) InvalidProtocolBufferException when reading column statistics from ORC files

2024-09-03 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878852#comment-17878852
 ] 

Stamatis Zampetakis commented on HIVE-26987:


[~zhangbutao] We need to attempt to run the repro again and check if it passes 
in order to mark this as resolved. With ORC-1361 we will get an empty list 
instead of an exception but we may need to add some special handling for the 
empty list in Hive.

> InvalidProtocolBufferException when reading column statistics from ORC files
> 
>
> Key: HIVE-26987
> URL: https://issues.apache.org/jira/browse/HIVE-26987
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, ORC
>Affects Versions: 3.1.0, 4.0.0-alpha-2
>Reporter: Stamatis Zampetakis
>Priority: Major
> Attachments: data.csv.gz, orc_large_column_metadata.q
>
>
> Any attempt to read an ORC file (query an ORC table) having a metadata 
> section with column statistics exceeding the hardcoded limit of 1GB 
> ([https://github.com/apache/orc/blob/2ff9001ddef082eaa30e21cbb034f266e0721664/java/core/src/java/org/apache/orc/impl/InStream.java#L41])
>  leads to the following exception.
> {noformat}
> Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol 
> message was too large.  May be malicious.  Use 
> CodedInputStream.setSizeLimit() to increase the size limit.
> at 
> com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:162)
> at 
> com.google.protobuf.CodedInputStream$StreamDecoder.readRawBytesSlowPathOneChunk(CodedInputStream.java:2940)
> at 
> com.google.protobuf.CodedInputStream$StreamDecoder.readBytesSlowPath(CodedInputStream.java:3021)
> at 
> com.google.protobuf.CodedInputStream$StreamDecoder.readBytes(CodedInputStream.java:2432)
> at org.apache.orc.OrcProto$StringStatistics.(OrcProto.java:1718)
> at org.apache.orc.OrcProto$StringStatistics.(OrcProto.java:1663)
> at 
> org.apache.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1766)
> at 
> org.apache.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1761)
> at 
> com.google.protobuf.CodedInputStream$StreamDecoder.readMessage(CodedInputStream.java:2409)
> at org.apache.orc.OrcProto$ColumnStatistics.(OrcProto.java:6552)
> at org.apache.orc.OrcProto$ColumnStatistics.(OrcProto.java:6468)
> at 
> org.apache.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:6678)
> at 
> org.apache.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:6673)
> at 
> com.google.protobuf.CodedInputStream$StreamDecoder.readMessage(CodedInputStream.java:2409)
> at 
> org.apache.orc.OrcProto$StripeStatistics.(OrcProto.java:19586)
> at 
> org.apache.orc.OrcProto$StripeStatistics.(OrcProto.java:19533)
> at 
> org.apache.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:19622)
> at 
> org.apache.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:19617)
> at 
> com.google.protobuf.CodedInputStream$StreamDecoder.readMessage(CodedInputStream.java:2409)
> at org.apache.orc.OrcProto$Metadata.(OrcProto.java:20270)
> at org.apache.orc.OrcProto$Metadata.(OrcProto.java:20217)
> at 
> org.apache.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:20306)
> at 
> org.apache.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:20301)
> at 
> com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:86)
> at 
> com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:91)
> at 
> com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:48)
> at org.apache.orc.OrcProto$Metadata.parseFrom(OrcProto.java:20438)
> at 
> org.apache.orc.impl.ReaderImpl.deserializeStripeStats(ReaderImpl.java:1013)
> at 
> org.apache.orc.impl.ReaderImpl.getVariantStripeStatistics(ReaderImpl.java:317)
> at 
> org.apache.orc.impl.ReaderImpl.getStripeStatistics(ReaderImpl.java:1047)
> at 
> org.apache.orc.impl.ReaderImpl.getStripeStatistics(ReaderImpl.java:1034)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:1679)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1557)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2900(OrcInputFormat.java:1342)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1529)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputF

[jira] [Commented] (HIVE-28408) Support ARRAY field access in CBO

2024-08-21 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875517#comment-17875517
 ] 

Stamatis Zampetakis commented on HIVE-28408:


To better understand the purpose of this ticket let's use a simpler and more 
readable example.

{code:sql}
CREATE TABLE book (
  bid int,
  title string,
  author struct<
  aid:int,
  name:string,
  addresses: array>>
  >
) STORED AS PARQUET;

INSERT INTO TABLE book VALUES (
1,
'Les Miserables',
named_struct('aid', 100, 'name', 'Victor-Hugo', 'addresses', array(
named_struct('street', 'Avenue Champs-Elysees', 'num', 42, 'gcs',
named_struct('latitude', 48.8701431D, 'longitude', 2.3051376D)),
named_struct('street', 'Rue de Rivoli', 'num', 8, 'gcs',
named_struct('latitude', 48.8554165D, 'longitude', 2.3582763D))
)));

SELECT author.addresses.gcs.latitude FROM book;
{code}

 The query returns the following result.
{noformat}
[48.8701431,48.8554165]
{noformat}

Observe that the "addresses" is a complex/struct ARRAY type.  The 
addresses.gcs.latitude expression aims to drill in/navigate/extract specific 
fields from the ARRAY while keeping the structure intact. This operation is 
similar to XPath and JSON navigational patterns and is not part of the SQL 
standard although supported by some DBMS.

Currently, when a query contains such expressions CBO fails and we fallback to 
legacy optimizer. 
{noformat}
 org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: 
Unexpected rexnode : org.apache.calcite.rex.RexFieldAccess
at 
org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createNestedColumnRefExpr(RexNodeExprFactory.java:629)
at 
org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createNestedColumnRefExpr(RexNodeExprFactory.java:97)
at 
org.apache.hadoop.hive.ql.parse.type.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:903)
at 
org.apache.hadoop.hive.ql.parse.type.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1481)
at 
org.apache.hadoop.hive.ql.lib.CostLessRuleDispatcher.dispatch(CostLessRuleDispatcher.java:66)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
at 
org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:101)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
at 
org.apache.hadoop.hive.ql.parse.type.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:231)
at 
org.apache.hadoop.hive.ql.parse.type.RexNodeTypeCheck.genExprNode(RexNodeTypeCheck.java:40)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genAllRexNode(CalcitePlanner.java:5376)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genRexNode(CalcitePlanner.java:5333)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.internalGenSelectLogicalPlan(CalcitePlanner.java:4660)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genSelectLogicalPlan(CalcitePlanner.java:4418)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5087)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1629)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1572)
at 
org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131)
at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914)
at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180)
at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1324)
{noformat}

Since the physical layer of Hive is able to handle field access over ARRAY 
types we have to find a CBO (RexNode) expression that allows to express field 
access over ARRAY expressions. This can be achieved by introducing a new 
Hive-specific operator (e.g. COMPONENT_ACCESS) that takes an expression of an 
ARRAY type  and alters its type so that we can perform field access as if it 
was a regular STRUCT.

The CBO plan for the query above would look like the following:

{noformat}
CBO PLAN:
HiveProject(latitude=[COMPONENT_ACCESS($2.addresses).gcs.latitude])
  HiveTableScan(table=[[default, book]], table:alias=[book])
{noformat}

The new operator acts mainly as syntactic sugar so when we translate it back to 
AST we can treat it as NOOP.

A proof of concept using the COMPONENT_ACCESS operator is attac

[jira] [Updated] (HIVE-28408) Support ARRAY field access in CBO

2024-08-21 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-28408:
---
Attachment: HIVE-28408.patch

> Support ARRAY field access in CBO
> -
>
> Key: HIVE-28408
> URL: https://issues.apache.org/jira/browse/HIVE-28408
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
> Attachments: CBO Fallback - Nested column pruning item.docx, 
> HIVE-28408.patch
>
>
> fname=nested_column_pruning.q
> {code:sql}
> EXPLAIN
> SELECT count(s1.f6), s5.f16.f18.f19
> FROM nested_tbl_1_n1
> GROUP BY s5.f16.f18.f19 
> {code}
> {noformat}
> org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: 
> Unexpected rexnode : org.apache.calcite.rex.RexFieldAccess{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28408) Support ARRAY field access in CBO

2024-08-21 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-28408:
---
Summary: Support ARRAY field access in CBO  (was: Support for the nested 
filed access within Array datatype )

> Support ARRAY field access in CBO
> -
>
> Key: HIVE-28408
> URL: https://issues.apache.org/jira/browse/HIVE-28408
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
> Attachments: CBO Fallback - Nested column pruning item.docx
>
>
> fname=nested_column_pruning.q
> {code:sql}
> EXPLAIN
> SELECT count(s1.f6), s5.f16.f18.f19
> FROM nested_tbl_1_n1
> GROUP BY s5.f16.f18.f19 
> {code}
> {noformat}
> org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: 
> Unexpected rexnode : org.apache.calcite.rex.RexFieldAccess{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28455) Missing dependencies due to upgrade of maven-shade-plugin

2024-08-21 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875429#comment-17875429
 ] 

Stamatis Zampetakis commented on HIVE-28455:


It is a bad practice for a single module to publish multiple jars since there 
is no way to publish multiple pom files for the same module. In the presence of 
multiple jars we have to decide what is the main artifact that should be used 
in dependent components and fix things accordingly.

> Missing dependencies due to upgrade of maven-shade-plugin
> -
>
> Key: HIVE-28455
> URL: https://issues.apache.org/jira/browse/HIVE-28455
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0, 4.0.0-beta-1, 4.1.0
>Reporter: Kokila N
>Assignee: Kokila N
>Priority: Major
>  Labels: hive-4.0.1-must
>
> For, hive jdbc , we create two jars {{hive-jdbc}} and 
> {{hive-jdbc-standalone}} (shaded jar/uber jar).
> *Reason for change in pom :*
> Due to the changes in the maven code after version 3.2.4, when we create a 
> shaded jar ( {{{}hive-jdbc-standalone{}}}),  {{dependency-reduced-pom.xml}}  
> is generated and dependencies that have been included into the uber JAR will 
> be removed from the {{}} section of the generated POM to avoid 
> duplication. This {{dependency-reduced-pom.xml}} is why the dependencies are 
> removed from the pom as its common for both {{hive-jdbc}} and 
> {{{}hive-jdbc-standalone{}}}. So, currently for hive-jdbc , the transitive 
> dependencies for it are not propagated.
> Same applies to hive-beeline and hive-exec modules as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28445) Uber JAR of HiveServer2 JDBC Driver 4.1.0-SNAPSHOT is incompatible with `org.apache.zookeeper.zookeeper:3.9.2`

2024-08-20 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875085#comment-17875085
 ] 

Stamatis Zampetakis commented on HIVE-28445:


The Hive JDBC jar has way more things than necessary. Adding more things inside 
the shaded artifact is not the right approach. Optional or complementary 
features should be pluggable and not require entire libraries to be packaged in 
the same jar. From my perspective the ideal thing would be to not have 
dependencies to zookeeper (or having them only as optional). 

> Uber JAR of HiveServer2 JDBC Driver 4.1.0-SNAPSHOT is incompatible with 
> `org.apache.zookeeper.zookeeper:3.9.2`
> --
>
> Key: HIVE-28445
> URL: https://issues.apache.org/jira/browse/HIVE-28445
> Project: Hive
>  Issue Type: Bug
>Reporter: Qiheng He
>Priority: Major
>
> - Uber JAR of HiveServer2 JDBC Driver 4.1.0-SNAPSHOT is incompatible with 
> `org.apache.zookeeper.zookeeper:3.9.2`. This is one of the findings from 
> https://github.com/apache/shardingsphere/pull/31526 .
> - Just a simple compile.
> {code:bash}
> sdk install java 8.0.422-tem
> sdk use java 8.0.422-tem
> sdk install maven
> git clone g...@github.com:apache/hive.git
> cd ./hive/
> git reset --hard b09d76e68bfba6be19733d864b3207f95265d11f
> mvn clean install -DskipTests -T1C 
> mvn clean package -pl packaging -DskipTests -Pdocker
> cd ../
> {code}
> - Introduce at will.
> {code:xml}
> 
> org.apache.hive
> hive-jdbc
> 4.1.0-SNAPSHOT
> standalone
> 
> 
> org.apache.zookeeper
> zookeeper
> 3.9.2
> 
> 
> org.apache.curator
> curator-test
> 5.7.0
> test
> 
> 
> org.junit.jupiter
> junit-jupiter
> 5.10.3
> test
> 
> 
> org.awaitility
> awaitility
> 4.2.0
> test
> 
> {code}
> - Start a Zookeeper Server in the unit test.
> {code:java}
> import org.apache.curator.CuratorZookeeperClient;
> import org.apache.curator.retry.ExponentialBackoffRetry;
> import org.apache.curator.test.TestingServer;
> import org.awaitility.Awaitility;
> import org.junit.jupiter.api.Test;
> import java.time.Duration;
> public class ZookeeperTest {
> @Test
> void testZookeeper() throws Exception {
> TestingServer testingServer = new TestingServer();
> try (
> CuratorZookeeperClient client = new 
> CuratorZookeeperClient(testingServer.getConnectString(),
> 60 * 1000, 500, null,
> new ExponentialBackoffRetry(500, 3, 500 * 3))) {
> client.start();
> Awaitility.await().atMost(Duration.ofMillis(500 * 
> 60)).ignoreExceptions().until(client::isConnected);
> }
> }
> }
> {code}
> - The following Error Log is obtained.
> {code:bash}
> [ERROR] 2024-08-14 13:35:55.349 [SyncThread:0] 
> o.a.z.server.ZooKeeperCriticalThread - Severe unrecoverable error, from 
> thread : SyncThread:0
> java.lang.NoSuchMethodError: 'long 
> org.apache.jute.OutputArchive.getDataSize()'
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:291)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:592)
>   at org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:672)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181)
> [ERROR] 2024-08-14 13:35:55.373 [zkservermainrunner] 
> o.a.zookeeper.server.ZooKeeperServer - Error updating DB
> java.io.EOFException: null
>   at java.base/java.io.DataInputStream.readFully(DataInputStream.java:210)
>   at java.base/java.io.DataInputStream.readInt(DataInputStream.java:385)
>   at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:96)
>   at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:67)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:725)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:743)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:711)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:687)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:646)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:466)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:453)
>   at 
> org.apache.zookeeper.server.persiste

[jira] [Commented] (HIVE-28455) Missing dependencies due to upgrade of maven-shade-plugin

2024-08-20 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875083#comment-17875083
 ] 

Stamatis Zampetakis commented on HIVE-28455:


When the dependencies are shaded into the main published jar the pom should not 
include them. That's the correct and expected behavior.

> Missing dependencies due to upgrade of maven-shade-plugin
> -
>
> Key: HIVE-28455
> URL: https://issues.apache.org/jira/browse/HIVE-28455
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0, 4.0.0-beta-1, 4.1.0
>Reporter: Kokila N
>Assignee: Kokila N
>Priority: Major
>  Labels: hive-4.0.1-must
>
> For, hive jdbc , we create two jars {{hive-jdbc}} and 
> {{hive-jdbc-standalone}} (shaded jar/uber jar).
> *Reason for change in pom :*
> Due to the changes in the maven code after version 3.2.4, when we create a 
> shaded jar ( {{{}hive-jdbc-standalone{}}}),  {{dependency-reduced-pom.xml}}  
> is generated and dependencies that have been included into the uber JAR will 
> be removed from the {{}} section of the generated POM to avoid 
> duplication. This {{dependency-reduced-pom.xml}} is why the dependencies are 
> removed from the pom as its common for both {{hive-jdbc}} and 
> {{{}hive-jdbc-standalone{}}}. So, currently for hive-jdbc , the transitive 
> dependencies for it are not propagated.
> Same applies to hive-beeline and hive-exec modules as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28449) Infer constant types from columns before strict type validation

2024-08-19 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874909#comment-17874909
 ] 

Stamatis Zampetakis commented on HIVE-28449:


The table below summarizes the behavior changes before and after PR#5374 for a 
comparison expression of the form {{{}E1 op E2{}}}, where op is a comparison 
operator (=,<,<=,>=,>,!=), E1 is a column reference expression, E2 is a 
constant/literal holding a numeric value.
||Case||E1 type||E2 type||Before||After||Example expression||
|I|BIGINT|STRING|ERROR/WARN|OK|c_bigint = '9223372036854775807'|
|II|DECIMAL|STRING|ERROR/WARN|OK|c_decimal_19_0 = '9223372036854775807'|
|III|DOUBLE|BIGINT|ERROR/WARN|OK|c_double = 9223372036854775807|

In a nutshell, the change will remove some compilation ERROR/WARNING messages 
for the above combinations but everything else (including the query plan) will 
remain unaltered.

For cases I, and II, the ERROR/WARN message is misleading since there is no 
information/precision loss on any side of the comparison.

For case III, the ERROR/WARN message is valid since the constant will be 
converted to DOUBLE and some digits will be truncated (Java long to double).

The change in PR#5374 addresses the unintentional behavior changes introduced 
by HIVE-23100 but at the same time weakens strict type checking (case III) and 
complicates the semantics of the "hive.strict.checks.type.safety" property. 
Moreover, it leads to more behavior changes (between Hive 4.0.0 and Hive 4.1.0) 
which might not be received well by all users.

Given that there are both pros and cons with the proposed changes here, I am 
more inclined to void this ticket and accept the existing behavior where strict 
type comparisons are done before any kind of type inference but I am fully open 
to other opinions as well. If the majority feels that the positives outweigh 
the negatives please leave a comment and review the PR.

> Infer constant types from columns before strict type validation
> ---
>
> Key: HIVE-28449
> URL: https://issues.apache.org/jira/browse/HIVE-28449
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
> Environment: 
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: backwards-compatibility
>
> HIVE-2249 introduced some [specialized type inference 
> logic|https://github.com/apache/hive/blob/5cbffb532a586226500abc498d6505722d62234d/ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java#L972]
>  that kicks in when there are comparisons between columns and numeric 
> constant expressions.
> Consider for instance a comparison between a BIGINT column and a STRING 
> constant.
> {code:sql}
> SELECT * FROM table WHERE c_bigint = '9223372036854775807'
> {code}
> The type derivation logic will attempt to convert the STRING constant to 
> BIGINT and evaluate the expression by comparing long values.
> Currently (commit 5cbffb532a586226500abc498d6505722d62234d), the query above 
> throws the following ERROR/WARNING:
> {noformat}
> Comparing bigint and string may result in loss of information.
> {noformat}
> This is due to strict type checking (controlled via 
> hive.strict.checks.type.safety property) that is now applied before the 
> constant type inference logic described above.
> In this case, the ERROR/WARNING is a bit misleading since there is no real 
> risk for losing precision/information since the STRING constant fits into a 
> BIGINT (Java long) and the whole comparison can be evaluated without 
> precision loss.
> For quite some time, strict type checking was performed *after* constant type 
> inference (and not *before*) but the behavior was changed unintentionally by 
> HIVE-23100.
> The goal of this change is to perform constant type inference before strict 
> type validation (behavior before HIVE-23100) to restore backward 
> compatibility and remove some unnecessary warnings/errors during compilation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-19741) Update documentation to reflect list of reserved words

2024-08-19 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-19741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874858#comment-17874858
 ] 

Stamatis Zampetakis commented on HIVE-19741:


[~okumin] I created INFRA-26047 to request the account creation from the INFRA 
team with the following info:

Display Name: Shohei Okumiya
Username: okumin
Email: m...@okumin.com

> Update documentation to reflect list of reserved words
> --
>
> Key: HIVE-19741
> URL: https://issues.apache.org/jira/browse/HIVE-19741
> Project: Hive
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Matt Burgess
>Assignee: Shohei Okumiya
>Priority: Minor
>
> The current list of non-reserved and reserved keywords is on the Hive wiki:
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Keywords,Non-reservedKeywordsandReservedKeywords
> However it does not match the list in code (see the lexer rules here):
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g
> On particular example is the "application" keyword, which was discovered 
> while trying to create a table with a column named "application".
> This Jira proposes to align the documentation with the current set of 
> non-reserved and reserved keywords.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-19741) Update documentation to reflect list of reserved words

2024-08-19 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-19741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874834#comment-17874834
 ] 

Stamatis Zampetakis commented on HIVE-19741:


[~okumin] Do you have a confluence JIRA id? If yes please share it and I will 
give you access to modify the wiki.

> Update documentation to reflect list of reserved words
> --
>
> Key: HIVE-19741
> URL: https://issues.apache.org/jira/browse/HIVE-19741
> Project: Hive
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Matt Burgess
>Assignee: Shohei Okumiya
>Priority: Minor
>
> The current list of non-reserved and reserved keywords is on the Hive wiki:
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Keywords,Non-reservedKeywordsandReservedKeywords
> However it does not match the list in code (see the lexer rules here):
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g
> On particular example is the "application" keyword, which was discovered 
> while trying to create a table with a column named "application".
> This Jira proposes to align the documentation with the current set of 
> non-reserved and reserved keywords.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28449) Infer constant types from columns before strict type validation

2024-08-16 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-28449:
--

 Summary: Infer constant types from columns before strict type 
validation
 Key: HIVE-28449
 URL: https://issues.apache.org/jira/browse/HIVE-28449
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
 Environment: 

Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


HIVE-2249 introduced some [specialized type inference 
logic|https://github.com/apache/hive/blob/5cbffb532a586226500abc498d6505722d62234d/ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java#L972]
 that kicks in when there are comparisons between columns and numeric constant 
expressions.

Consider for instance a comparison between a BIGINT column and a STRING 
constant.
{code:sql}
SELECT * FROM table WHERE c_bigint = '9223372036854775807'
{code}
The type derivation logic will attempt to convert the STRING constant to BIGINT 
and evaluate the expression by comparing long values.

Currently (commit 5cbffb532a586226500abc498d6505722d62234d), the query above 
throws the following ERROR/WARNING:
{noformat}
Comparing bigint and string may result in loss of information.
{noformat}
This is due to strict type checking (controlled via 
hive.strict.checks.type.safety property) that is now applied before the 
constant type inference logic described above.

In this case, the ERROR/WARNING is a bit misleading since there is no real risk 
for losing precision/information since the STRING constant fits into a BIGINT 
(Java long) and the whole comparison can be evaluated without precision loss.

For quite some time, strict type checking was performed *after* constant type 
inference (and not *before*) but the behavior was changed unintentionally by 
HIVE-23100.

The goal of this change is to perform constant type inference before strict 
type validation (behavior before HIVE-23100) to restore backward compatibility 
and remove some unnecessary warnings/errors during compilation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-24907) Wrong results with LEFT JOIN and subqueries with UNION and GROUP BY

2024-08-15 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-24907:
---
Affects Version/s: (was: 4.0.0)

> Wrong results with LEFT JOIN and subqueries with UNION and GROUP BY
> ---
>
> Key: HIVE-24907
> URL: https://issues.apache.org/jira/browse/HIVE-24907
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.4.0, 3.2.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
> Fix For: 4.0.0
>
>
> The following SQL query returns wrong results when run in TEZ/LLAP:
> {code:sql}
> SET hive.auto.convert.sortmerge.join=true;
> CREATE TABLE tbl (key int,value int);
> INSERT INTO tbl VALUES (1, 2000);
> INSERT INTO tbl VALUES (2, 2001);
> INSERT INTO tbl VALUES (3, 2005);
> SELECT sub1.key, sub2.key
> FROM
>   (SELECT a.key FROM tbl a GROUP BY a.key) sub1
> LEFT OUTER JOIN (
>   SELECT b.key FROM tbl b WHERE b.value = 2001 GROUP BY b.key
>   UNION
>   SELECT c.key FROM tbl c WHERE c.value = 2005 GROUP BY c.key) sub2 
> ON sub1.key = sub2.key;
> {code}
> Actual results:
> ||SUB1.KEY||SUB2.KEY||
> |1|NULL|
> |2|NULL|
> |3|NULL|
> Expected results:
> ||SUB1.KEY||SUB2.KEY||
> |1|NULL|
> |2|2|
> |3|3|
> Tested can be reproduced with {{TestMiniLlapLocalCliDriver}} or 
> {{TestMiniTezCliDriver}} in older versions of Hive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-24907) Wrong results with LEFT JOIN and subqueries with UNION and GROUP BY

2024-08-15 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-24907.

Fix Version/s: 4.0.0
   Resolution: Fixed

> Wrong results with LEFT JOIN and subqueries with UNION and GROUP BY
> ---
>
> Key: HIVE-24907
> URL: https://issues.apache.org/jira/browse/HIVE-24907
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.4.0, 3.2.0, 4.0.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
> Fix For: 4.0.0
>
>
> The following SQL query returns wrong results when run in TEZ/LLAP:
> {code:sql}
> SET hive.auto.convert.sortmerge.join=true;
> CREATE TABLE tbl (key int,value int);
> INSERT INTO tbl VALUES (1, 2000);
> INSERT INTO tbl VALUES (2, 2001);
> INSERT INTO tbl VALUES (3, 2005);
> SELECT sub1.key, sub2.key
> FROM
>   (SELECT a.key FROM tbl a GROUP BY a.key) sub1
> LEFT OUTER JOIN (
>   SELECT b.key FROM tbl b WHERE b.value = 2001 GROUP BY b.key
>   UNION
>   SELECT c.key FROM tbl c WHERE c.value = 2005 GROUP BY c.key) sub2 
> ON sub1.key = sub2.key;
> {code}
> Actual results:
> ||SUB1.KEY||SUB2.KEY||
> |1|NULL|
> |2|NULL|
> |3|NULL|
> Expected results:
> ||SUB1.KEY||SUB2.KEY||
> |1|NULL|
> |2|2|
> |3|3|
> Tested can be reproduced with {{TestMiniLlapLocalCliDriver}} or 
> {{TestMiniTezCliDriver}} in older versions of Hive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (HIVE-24907) Wrong results with LEFT JOIN and subqueries with UNION and GROUP BY

2024-08-15 Thread Stamatis Zampetakis (Jira)


[ https://issues.apache.org/jira/browse/HIVE-24907 ]


Stamatis Zampetakis deleted comment on HIVE-24907:


was (Author: zabetak):
Thanks for looking into this [~soumyakanti.das]. Indeed the HIVE-27303 seems to 
be the right fix for the problem reported here so we can mark this as resolved. 
Since HIVE-27303 was fixed in 4.0.0 I will also assign the same fix version to 
this ticket as well.

> Wrong results with LEFT JOIN and subqueries with UNION and GROUP BY
> ---
>
> Key: HIVE-24907
> URL: https://issues.apache.org/jira/browse/HIVE-24907
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.4.0, 3.2.0, 4.0.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> The following SQL query returns wrong results when run in TEZ/LLAP:
> {code:sql}
> SET hive.auto.convert.sortmerge.join=true;
> CREATE TABLE tbl (key int,value int);
> INSERT INTO tbl VALUES (1, 2000);
> INSERT INTO tbl VALUES (2, 2001);
> INSERT INTO tbl VALUES (3, 2005);
> SELECT sub1.key, sub2.key
> FROM
>   (SELECT a.key FROM tbl a GROUP BY a.key) sub1
> LEFT OUTER JOIN (
>   SELECT b.key FROM tbl b WHERE b.value = 2001 GROUP BY b.key
>   UNION
>   SELECT c.key FROM tbl c WHERE c.value = 2005 GROUP BY c.key) sub2 
> ON sub1.key = sub2.key;
> {code}
> Actual results:
> ||SUB1.KEY||SUB2.KEY||
> |1|NULL|
> |2|NULL|
> |3|NULL|
> Expected results:
> ||SUB1.KEY||SUB2.KEY||
> |1|NULL|
> |2|2|
> |3|3|
> Tested can be reproduced with {{TestMiniLlapLocalCliDriver}} or 
> {{TestMiniTezCliDriver}} in older versions of Hive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27303) select query result is different when enable/disable mapjoin with UNION ALL

2024-08-15 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-27303:
---
Fix Version/s: 4.0.0

> select query result is different when enable/disable mapjoin with UNION ALL
> ---
>
> Key: HIVE-27303
> URL: https://issues.apache.org/jira/browse/HIVE-27303
> Project: Hive
>  Issue Type: Bug
>Reporter: Mahesh Raju Somalaraju
>Assignee: Mahesh Raju Somalaraju
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> select query result is different when enable/disable mapjoin with UNION ALL
> Below are the reproduce steps.
> As per query when map.join is disabled it should not give rows(duplicate). 
> Same is working fine with map.join=true.
> Expected result: Empty rows.
> Problem: returning duplicate rows.
> Steps:
> --
> SET hive.server2.tez.queue.access.check=true;
> SET tez.queue.name=default
> SET hive.query.results.cache.enabled=false;
> SET hive.fetch.task.conversion=none;
> SET hive.execution.engine=tez;
> SET hive.stats.autogather=true;
> SET hive.server2.enable.doAs=false;
> SET hive.auto.convert.join=false;
> drop table if exists hive1_tbl_data;
> drop table if exists hive2_tbl_data;
> drop table if exists hive3_tbl_data;
> drop table if exists hive4_tbl_data;
> CREATE EXTERNAL TABLE hive1_tbl_data (COLUMID string,COLUMN_FN 
> string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM 
> string) 
>  ROW FORMAT SERDE                                   
>    'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  
>  STORED AS INPUTFORMAT                              
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  
>  OUTPUTFORMAT                                       
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
>  TBLPROPERTIES (                                    
>    'TRANSLATED_TO_EXTERNAL'='true',                 
>    'bucketing_version'='2',                         
>    'external.table.purge'='true',                   
>    'parquet.compression'='SNAPPY');
> CREATE EXTERNAL TABLE hive2_tbl_data (COLUMID string,COLUMN_FN 
> string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM 
> string) 
>  ROW FORMAT SERDE                                   
>    'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  
>  STORED AS INPUTFORMAT                              
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  
>  OUTPUTFORMAT                                       
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
>  TBLPROPERTIES (                                    
>    'TRANSLATED_TO_EXTERNAL'='true',                 
>    'bucketing_version'='2',                         
>    'external.table.purge'='true',                   
>    'parquet.compression'='SNAPPY');
> CREATE EXTERNAL TABLE hive3_tbl_data (COLUMID string,COLUMN_FN 
> string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM 
> string) 
>  ROW FORMAT SERDE                                   
>    'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  
>  STORED AS INPUTFORMAT                              
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  
>  OUTPUTFORMAT                                       
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
>  TBLPROPERTIES (                                    
>    'TRANSLATED_TO_EXTERNAL'='true',                 
>    'bucketing_version'='2',                         
>    'external.table.purge'='true',                   
>    'parquet.compression'='SNAPPY');
>    CREATE EXTERNAL TABLE hive4_tbl_data (COLUMID string,COLUMN_FN 
> string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM 
> string) 
>  ROW FORMAT SERDE                                   
>    'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  
>  STORED AS INPUTFORMAT                              
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  
>  OUTPUTFORMAT                                       
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
>  TBLPROPERTIES (                                    
>    'TRANSLATED_TO_EXTERNAL'='true',                 
>    'bucketing_version'='2',                         
>    'external.table.purge'='true',                   
>    'parquet.compression'='SNAPPY');
>  
> insert into table hive1_tbl_data select 
> '1','john','doe','j...@hotmail.com','2014-01-01 12:01:02','4000-1';
> insert into table hive1_tbl_data select 
> '2','john','doe','j...@hotmail.com','2014-01-01 
> 12:01:02','4000-1';insert into table hive2_tbl_data select 
> '1','john','do

[jira] [Commented] (HIVE-24907) Wrong results with LEFT JOIN and subqueries with UNION and GROUP BY

2024-08-15 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873845#comment-17873845
 ] 

Stamatis Zampetakis commented on HIVE-24907:


Thanks for looking into this [~soumyakanti.das]. Indeed the HIVE-27303 seems to 
be the right fix for the problem reported here so we can mark this as resolved. 
Since HIVE-27303 was fixed in 4.0.0 I will also assign the same fix version to 
this ticket as well.

> Wrong results with LEFT JOIN and subqueries with UNION and GROUP BY
> ---
>
> Key: HIVE-24907
> URL: https://issues.apache.org/jira/browse/HIVE-24907
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.4.0, 3.2.0, 4.0.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> The following SQL query returns wrong results when run in TEZ/LLAP:
> {code:sql}
> SET hive.auto.convert.sortmerge.join=true;
> CREATE TABLE tbl (key int,value int);
> INSERT INTO tbl VALUES (1, 2000);
> INSERT INTO tbl VALUES (2, 2001);
> INSERT INTO tbl VALUES (3, 2005);
> SELECT sub1.key, sub2.key
> FROM
>   (SELECT a.key FROM tbl a GROUP BY a.key) sub1
> LEFT OUTER JOIN (
>   SELECT b.key FROM tbl b WHERE b.value = 2001 GROUP BY b.key
>   UNION
>   SELECT c.key FROM tbl c WHERE c.value = 2005 GROUP BY c.key) sub2 
> ON sub1.key = sub2.key;
> {code}
> Actual results:
> ||SUB1.KEY||SUB2.KEY||
> |1|NULL|
> |2|NULL|
> |3|NULL|
> Expected results:
> ||SUB1.KEY||SUB2.KEY||
> |1|NULL|
> |2|2|
> |3|3|
> Tested can be reproduced with {{TestMiniLlapLocalCliDriver}} or 
> {{TestMiniTezCliDriver}} in older versions of Hive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-24907) Wrong results with LEFT JOIN and subqueries with UNION and GROUP BY

2024-08-15 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873846#comment-17873846
 ] 

Stamatis Zampetakis commented on HIVE-24907:


Thanks for looking into this [~soumyakanti.das]. Indeed the HIVE-27303 seems to 
be the right fix for the problem reported here so we can mark this as resolved. 
Since HIVE-27303 was fixed in 4.0.0 I will also assign the same fix version to 
this ticket as well.

> Wrong results with LEFT JOIN and subqueries with UNION and GROUP BY
> ---
>
> Key: HIVE-24907
> URL: https://issues.apache.org/jira/browse/HIVE-24907
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.4.0, 3.2.0, 4.0.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> The following SQL query returns wrong results when run in TEZ/LLAP:
> {code:sql}
> SET hive.auto.convert.sortmerge.join=true;
> CREATE TABLE tbl (key int,value int);
> INSERT INTO tbl VALUES (1, 2000);
> INSERT INTO tbl VALUES (2, 2001);
> INSERT INTO tbl VALUES (3, 2005);
> SELECT sub1.key, sub2.key
> FROM
>   (SELECT a.key FROM tbl a GROUP BY a.key) sub1
> LEFT OUTER JOIN (
>   SELECT b.key FROM tbl b WHERE b.value = 2001 GROUP BY b.key
>   UNION
>   SELECT c.key FROM tbl c WHERE c.value = 2005 GROUP BY c.key) sub2 
> ON sub1.key = sub2.key;
> {code}
> Actual results:
> ||SUB1.KEY||SUB2.KEY||
> |1|NULL|
> |2|NULL|
> |3|NULL|
> Expected results:
> ||SUB1.KEY||SUB2.KEY||
> |1|NULL|
> |2|2|
> |3|3|
> Tested can be reproduced with {{TestMiniLlapLocalCliDriver}} or 
> {{TestMiniTezCliDriver}} in older versions of Hive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28423) The doc for enabling ZooKeeper Service Discovery on HiveServer2 is missing the requirement statement for `hive.server2.support.dynamic.service.discovery`

2024-08-12 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17872833#comment-17872833
 ] 

Stamatis Zampetakis commented on HIVE-28423:


Hey [~linghengqian] , I gave you permissions to edit the wiki. Please check if 
you are able to modify the desired pages.

Unfortunately there is no PR/review model for wiki changes so what you did here 
is perfect.

I checked the proposed suggestions and they make sense to me. I cannot really 
test these at the moment but I am OK to update the wiki per your suggestions.

Once, you are done please mark this ticket as resolved.

> The doc for enabling ZooKeeper Service Discovery on HiveServer2 is missing 
> the requirement statement for `hive.server2.support.dynamic.service.discovery`
> -
>
> Key: HIVE-28423
> URL: https://issues.apache.org/jira/browse/HIVE-28423
> Project: Hive
>  Issue Type: Improvement
>Reporter: Qiheng He
>Assignee: Qiheng He
>Priority: Major
>
> - The doc for enabling ZooKeeper Service Discovery on HiveServer2 is missing 
> the requirement statement for 
> *hive.server2.support.dynamic.service.discovery*. This is a documentation 
> issue I noticed at [https://github.com/dbeaver/dbeaver/issues/22777] , where 
> dbeaver contributors spent 6 months trying to figure out how to start 
> ZooKeeper Service Discovery on HiveServer2.
> - 
> https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-ConnectionURLWhenZooKeeperServiceDiscoveryIsEnabled
>  describes ZooKeeper Service Discovery like this.
> {code:bash}
> ZooKeeper-based service discovery introduced in Hive 0.14.0 (HIVE-7935) 
> enables high availability and rolling upgrade for HiveServer2. A JDBC URL 
> that specifies  needs to be used to make use of these 
> features.
> With further changes in Hive 2.0.0 and 1.3.0 (unreleased, HIVE-11581), none 
> of the additional configuration parameters such as authentication mode, 
> transport mode, or SSL parameters need to be specified, as they are retrieved 
> from the ZooKeeper entries along with the hostname.
> The JDBC connection URL: jdbc:hive2:// quorum>/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 .
> The  is the same as the value of hive.zookeeper.quorum 
> configuration parameter in hive-site.xml/hivserver2-site.xml used by 
> HiveServer2.
> {code}
> - I did a test at https://github.com/linghengqian/hivesever2-v400-sd-test to 
> verify that setting *hive.zookeeper.quorum* only on HiveServer2 was not 
> enough. I found the *hive.server2.support.dynamic.service.discovery* property 
> defined in the *org.apache.hadoop.hive.conf.HiveConf* class in a 
> stackoverflow discussion. 
> - To verify this git, just execute the following shell. Related unit tests 
> occupy *2181*, *1*, *10002* ports to start Docker Container.
> {code:bash}
> sdk install java 22.0.2-graalce
> sdk use java 22.0.2-graalce
> git clone g...@github.com:linghengqian/hivesever2-v400-sd-test.git
> cd ./hivesever2-v400-sd-test/
> docker compose -f ./docker-compose-lingh.yml pull
> docker compose -f ./docker-compose-lingh.yml up -d
> # ... Wait five seconds for HiveServer2 to finish initializing.
> ./mvnw clean test
> docker compose -f ./docker-compose-lingh.yml down
> {code}
> - I also searched for the keyword 
> *hive.server2.support.dynamic.service.discovery* in https://cwiki.apache.org/ 
> , but I could only find this property in the documentation page of the KNOX 
> project 
> https://cwiki.apache.org/confluence/display/KNOX/Dynamic+HA+Provider+Configuration
>  , which doesn't make sense from my perspective.
> - From my perspective, it is reasonable to add the description of 
> *hive.server2.support.dynamic.service.discovery* properties to the 
> documentation of apache/hive:4.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-28423) The doc for enabling ZooKeeper Service Discovery on HiveServer2 is missing the requirement statement for `hive.server2.support.dynamic.service.discovery`

2024-08-12 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-28423:
--

Assignee: Qiheng He

> The doc for enabling ZooKeeper Service Discovery on HiveServer2 is missing 
> the requirement statement for `hive.server2.support.dynamic.service.discovery`
> -
>
> Key: HIVE-28423
> URL: https://issues.apache.org/jira/browse/HIVE-28423
> Project: Hive
>  Issue Type: Improvement
>Reporter: Qiheng He
>Assignee: Qiheng He
>Priority: Major
>
> - The doc for enabling ZooKeeper Service Discovery on HiveServer2 is missing 
> the requirement statement for 
> *hive.server2.support.dynamic.service.discovery*. This is a documentation 
> issue I noticed at [https://github.com/dbeaver/dbeaver/issues/22777] , where 
> dbeaver contributors spent 6 months trying to figure out how to start 
> ZooKeeper Service Discovery on HiveServer2.
> - 
> https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-ConnectionURLWhenZooKeeperServiceDiscoveryIsEnabled
>  describes ZooKeeper Service Discovery like this.
> {code:bash}
> ZooKeeper-based service discovery introduced in Hive 0.14.0 (HIVE-7935) 
> enables high availability and rolling upgrade for HiveServer2. A JDBC URL 
> that specifies  needs to be used to make use of these 
> features.
> With further changes in Hive 2.0.0 and 1.3.0 (unreleased, HIVE-11581), none 
> of the additional configuration parameters such as authentication mode, 
> transport mode, or SSL parameters need to be specified, as they are retrieved 
> from the ZooKeeper entries along with the hostname.
> The JDBC connection URL: jdbc:hive2:// quorum>/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 .
> The  is the same as the value of hive.zookeeper.quorum 
> configuration parameter in hive-site.xml/hivserver2-site.xml used by 
> HiveServer2.
> {code}
> - I did a test at https://github.com/linghengqian/hivesever2-v400-sd-test to 
> verify that setting *hive.zookeeper.quorum* only on HiveServer2 was not 
> enough. I found the *hive.server2.support.dynamic.service.discovery* property 
> defined in the *org.apache.hadoop.hive.conf.HiveConf* class in a 
> stackoverflow discussion. 
> - To verify this git, just execute the following shell. Related unit tests 
> occupy *2181*, *1*, *10002* ports to start Docker Container.
> {code:bash}
> sdk install java 22.0.2-graalce
> sdk use java 22.0.2-graalce
> git clone g...@github.com:linghengqian/hivesever2-v400-sd-test.git
> cd ./hivesever2-v400-sd-test/
> docker compose -f ./docker-compose-lingh.yml pull
> docker compose -f ./docker-compose-lingh.yml up -d
> # ... Wait five seconds for HiveServer2 to finish initializing.
> ./mvnw clean test
> docker compose -f ./docker-compose-lingh.yml down
> {code}
> - I also searched for the keyword 
> *hive.server2.support.dynamic.service.discovery* in https://cwiki.apache.org/ 
> , but I could only find this property in the documentation page of the KNOX 
> project 
> https://cwiki.apache.org/confluence/display/KNOX/Dynamic+HA+Provider+Configuration
>  , which doesn't make sense from my perspective.
> - From my perspective, it is reasonable to add the description of 
> *hive.server2.support.dynamic.service.discovery* properties to the 
> documentation of apache/hive:4.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28401) Drop redundant XML test report post-processing from CI pipeline

2024-08-07 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-28401.

Fix Version/s: 4.1.0
   Resolution: Fixed

Fixed in 
[https://github.com/apache/hive/commit/90165d76826439cbad38e10eb126e8710ffc1d28]

Thanks for the review [~asolimando] !

> Drop redundant XML test report post-processing from CI pipeline
> ---
>
> Key: HIVE-28401
> URL: https://issues.apache.org/jira/browse/HIVE-28401
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> The [Maven Surefire 
> plugin|https://maven.apache.org/surefire/maven-surefire-plugin/#maven-surefire-plugin]
>  generates an XML report containing various information regarding the 
> execution of tests. In case of failures the system-out and system-err output 
> from the test is saved in the XML file.
> The Jenkins pipeline has a post-processing 
> [step|https://github.com/apache/hive/blob/78f577d73e5a49ca0f8f1dcae721f3980162872a/Jenkinsfile#L380]
>  that attempts to remove the system-out and system-err entries from the XML 
> files generated by Surefire for all tests that passed as an attempt to save 
> disk space in the Jenkins node.
> {code:bash}
> # removes all stdout and err for passed tests
> xmlstarlet ed -L -d 'testsuite/testcase/system-out[count(../failure)=0]' -d 
> 'testsuite/testcase/system-err[count(../failure)=0]' 
> {code}
> This cleanup step is not necessary since Surefire (3.0.0-M4) is not storing 
> system-out and system-err for tests that passed. 
> Moreover, when the XML report file is large xmlstarlet chokes and throws a 
> "Huge input lookup" error that skips the remaining post-processing steps and 
> makes the build fail.
> {noformat}
> [2024-07-23T16:11:26.052Z] 
> ./itests/qtest/target/surefire-reports/TEST-org.apache.hadoop.hive.cli.split31.TestMiniLlapLocalCliDriver.xml:53539.2:
>  internal error: Huge input lookup
> [2024-07-23T16:11:26.053Z] 2024-07-23T09:02:51,799  INFO 
> [734aa572-f1e1-4376-8c1c-9666c216e579 main] Sessio
> [2024-07-23T16:11:26.053Z]  ^
> [2024-07-23T16:11:43.133Z] Recording test results
> [2024-07-23T16:11:50.785Z] [Checks API] No suitable checks publisher found.
> script returned exit code 3
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28376) Remove unused Hive object from RelOptHiveTable

2024-08-07 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-28376.

Fix Version/s: 4.1.0
   Resolution: Fixed

Fixed in 
[https://github.com/apache/hive/commit/59e8f0d9eac8fce6a0586ef1a3deef53a774c86a]

Thanks for the reviews [~simhadri-g] and [~kokila19] !

> Remove unused Hive object from RelOptHiveTable
> --
>
> Key: HIVE-28376
> URL: https://issues.apache.org/jira/browse/HIVE-28376
> Project: Hive
>  Issue Type: Task
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> The 
> [Hive|https://github.com/apache/hive/blob/b18d5732b4f309fdc3b8226847c9c1ebcd2476fd/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java]
>  object is not used inside RelOptHiveTable so keeping a reference to it is 
> wasting memory and also complicates creation of RelOptHiveTable objects 
> (constructor parameter). 
> Moreover, the Hive objects have thread local scope so in general they 
> shouldn't be passed around cause their lifecycle becomes harder to manage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28423) The doc for enabling ZooKeeper Service Discovery on HiveServer2 is missing the requirement statement for `hive.server2.support.dynamic.service.discovery`

2024-08-01 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870187#comment-17870187
 ] 

Stamatis Zampetakis commented on HIVE-28423:


[~linghengqian] can you clarify what is exactly the modification that you are 
proposing? 

If you want to contribute to the wiki yourself please create an account and 
give me your username so that I can give you the appropriate permissions.

> The doc for enabling ZooKeeper Service Discovery on HiveServer2 is missing 
> the requirement statement for `hive.server2.support.dynamic.service.discovery`
> -
>
> Key: HIVE-28423
> URL: https://issues.apache.org/jira/browse/HIVE-28423
> Project: Hive
>  Issue Type: Improvement
>Reporter: Qiheng He
>Priority: Major
>
> - The doc for enabling ZooKeeper Service Discovery on HiveServer2 is missing 
> the requirement statement for 
> *hive.server2.support.dynamic.service.discovery*. This is a documentation 
> issue I noticed at [https://github.com/dbeaver/dbeaver/issues/22777] , where 
> dbeaver contributors spent 6 months trying to figure out how to start 
> ZooKeeper Service Discovery on HiveServer2.
> - 
> https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-ConnectionURLWhenZooKeeperServiceDiscoveryIsEnabled
>  describes ZooKeeper Service Discovery like this.
> {code:bash}
> ZooKeeper-based service discovery introduced in Hive 0.14.0 (HIVE-7935) 
> enables high availability and rolling upgrade for HiveServer2. A JDBC URL 
> that specifies  needs to be used to make use of these 
> features.
> With further changes in Hive 2.0.0 and 1.3.0 (unreleased, HIVE-11581), none 
> of the additional configuration parameters such as authentication mode, 
> transport mode, or SSL parameters need to be specified, as they are retrieved 
> from the ZooKeeper entries along with the hostname.
> The JDBC connection URL: jdbc:hive2:// quorum>/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 .
> The  is the same as the value of hive.zookeeper.quorum 
> configuration parameter in hive-site.xml/hivserver2-site.xml used by 
> HiveServer2.
> {code}
> - I did a test at https://github.com/linghengqian/hivesever2-v400-sd-test to 
> verify that setting *hive.zookeeper.quorum* only on HiveServer2 was not 
> enough. I found the *hive.server2.support.dynamic.service.discovery* property 
> defined in the *org.apache.hadoop.hive.conf.HiveConf* class in a 
> stackoverflow discussion. 
> - To verify this git, just execute the following shell. Related unit tests 
> occupy *2181*, *1*, *10002* ports to start Docker Container.
> {code:bash}
> sdk install java 22.0.2-graalce
> sdk use java 22.0.2-graalce
> git clone g...@github.com:linghengqian/hivesever2-v400-sd-test.git
> cd ./hivesever2-v400-sd-test/
> docker compose -f ./docker-compose-lingh.yml pull
> docker compose -f ./docker-compose-lingh.yml up -d
> # ... Wait five seconds for HiveServer2 to finish initializing.
> ./mvnw clean test
> docker compose -f ./docker-compose-lingh.yml down
> {code}
> - I also searched for the keyword 
> *hive.server2.support.dynamic.service.discovery* in https://cwiki.apache.org/ 
> , but I could only find this property in the documentation page of the KNOX 
> project 
> https://cwiki.apache.org/confluence/display/KNOX/Dynamic+HA+Provider+Configuration
>  , which doesn't make sense from my perspective.
> - From my perspective, it is reasonable to add the description of 
> *hive.server2.support.dynamic.service.discovery* properties to the 
> documentation of apache/hive:4.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28425) Document CAST FORMAT function in the wiki

2024-08-01 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-28425:
--

 Summary: Document CAST FORMAT function in the wiki
 Key: HIVE-28425
 URL: https://issues.apache.org/jira/browse/HIVE-28425
 Project: Hive
  Issue Type: Task
  Components: Documentation
Reporter: Stamatis Zampetakis


The CAST(  AS  FORMAT ) function has been implemented in 
HIVE-21575 but does not appear in the respective page in the wiki:

https://cwiki.apache.org/confluence/display/Hive/Hive+UDFs#HiveUDFs-TypeConversionFunctions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28401) Drop redundant XML test report post-processing from CI pipeline

2024-07-31 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869893#comment-17869893
 ] 

Stamatis Zampetakis commented on HIVE-28401:


Since we are removing code its not easy to find the right place for adding a 
comment. How about documenting it in the commit message?

> Drop redundant XML test report post-processing from CI pipeline
> ---
>
> Key: HIVE-28401
> URL: https://issues.apache.org/jira/browse/HIVE-28401
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>
> The [Maven Surefire 
> plugin|https://maven.apache.org/surefire/maven-surefire-plugin/#maven-surefire-plugin]
>  generates an XML report containing various information regarding the 
> execution of tests. In case of failures the system-out and system-err output 
> from the test is saved in the XML file.
> The Jenkins pipeline has a post-processing 
> [step|https://github.com/apache/hive/blob/78f577d73e5a49ca0f8f1dcae721f3980162872a/Jenkinsfile#L380]
>  that attempts to remove the system-out and system-err entries from the XML 
> files generated by Surefire for all tests that passed as an attempt to save 
> disk space in the Jenkins node.
> {code:bash}
> # removes all stdout and err for passed tests
> xmlstarlet ed -L -d 'testsuite/testcase/system-out[count(../failure)=0]' -d 
> 'testsuite/testcase/system-err[count(../failure)=0]' 
> {code}
> This cleanup step is not necessary since Surefire (3.0.0-M4) is not storing 
> system-out and system-err for tests that passed. 
> Moreover, when the XML report file is large xmlstarlet chokes and throws a 
> "Huge input lookup" error that skips the remaining post-processing steps and 
> makes the build fail.
> {noformat}
> [2024-07-23T16:11:26.052Z] 
> ./itests/qtest/target/surefire-reports/TEST-org.apache.hadoop.hive.cli.split31.TestMiniLlapLocalCliDriver.xml:53539.2:
>  internal error: Huge input lookup
> [2024-07-23T16:11:26.053Z] 2024-07-23T09:02:51,799  INFO 
> [734aa572-f1e1-4376-8c1c-9666c216e579 main] Sessio
> [2024-07-23T16:11:26.053Z]  ^
> [2024-07-23T16:11:43.133Z] Recording test results
> [2024-07-23T16:11:50.785Z] [Checks API] No suitable checks publisher found.
> script returned exit code 3
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28420) Incorrect URL to Hive mailing lists in HowToContribute Wiki page

2024-07-31 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-28420:
---
Component/s: Documentation

> Incorrect URL to Hive mailing lists in HowToContribute Wiki page
> 
>
> Key: HIVE-28420
> URL: https://issues.apache.org/jira/browse/HIVE-28420
> Project: Hive
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Qiheng He
>Assignee: Stamatis Zampetakis
>Priority: Major
> Fix For: Not Applicable
>
>
> - HowToContribute Wiki page uses a non-existent URL. See 
> [https://cwiki.apache.org/confluence/display/Hive/HowToContribute] .
> {code:bash}
> Stay Involved
> Contributors should join the Hive mailing lists. In particular the dev list 
> (to join discussions of changes) and the user list (to help others).
> {code}
>  - Clicking on `Hive mailing lists` takes me to 
> [https://hadoop.apache.org/hive/mailing_lists.html] which is inaccessible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28420) HowToContribute Wiki page uses a non-existent URL

2024-07-31 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869883#comment-17869883
 ] 

Stamatis Zampetakis commented on HIVE-28420:


Thanks for reporting this [~linghengqian] . It is now fixed, please test and 
let me know how it looks.

> HowToContribute Wiki page uses a non-existent URL
> -
>
> Key: HIVE-28420
> URL: https://issues.apache.org/jira/browse/HIVE-28420
> Project: Hive
>  Issue Type: Bug
>Reporter: Qiheng He
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> - HowToContribute Wiki page uses a non-existent URL. See 
> [https://cwiki.apache.org/confluence/display/Hive/HowToContribute] .
> {code:bash}
> Stay Involved
> Contributors should join the Hive mailing lists. In particular the dev list 
> (to join discussions of changes) and the user list (to help others).
> {code}
>  - Clicking on `Hive mailing lists` takes me to 
> [https://hadoop.apache.org/hive/mailing_lists.html] which is inaccessible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28420) Incorrect URL to Hive mailing lists in HowToContribute Wiki page

2024-07-31 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-28420.

Fix Version/s: Not Applicable
   Resolution: Fixed

> Incorrect URL to Hive mailing lists in HowToContribute Wiki page
> 
>
> Key: HIVE-28420
> URL: https://issues.apache.org/jira/browse/HIVE-28420
> Project: Hive
>  Issue Type: Bug
>Reporter: Qiheng He
>Assignee: Stamatis Zampetakis
>Priority: Major
> Fix For: Not Applicable
>
>
> - HowToContribute Wiki page uses a non-existent URL. See 
> [https://cwiki.apache.org/confluence/display/Hive/HowToContribute] .
> {code:bash}
> Stay Involved
> Contributors should join the Hive mailing lists. In particular the dev list 
> (to join discussions of changes) and the user list (to help others).
> {code}
>  - Clicking on `Hive mailing lists` takes me to 
> [https://hadoop.apache.org/hive/mailing_lists.html] which is inaccessible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28420) Incorrect URL to Hive mailing lists in HowToContribute Wiki page

2024-07-31 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-28420:
---
Summary: Incorrect URL to Hive mailing lists in HowToContribute Wiki page  
(was: HowToContribute Wiki page uses a non-existent URL)

> Incorrect URL to Hive mailing lists in HowToContribute Wiki page
> 
>
> Key: HIVE-28420
> URL: https://issues.apache.org/jira/browse/HIVE-28420
> Project: Hive
>  Issue Type: Bug
>Reporter: Qiheng He
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> - HowToContribute Wiki page uses a non-existent URL. See 
> [https://cwiki.apache.org/confluence/display/Hive/HowToContribute] .
> {code:bash}
> Stay Involved
> Contributors should join the Hive mailing lists. In particular the dev list 
> (to join discussions of changes) and the user list (to help others).
> {code}
>  - Clicking on `Hive mailing lists` takes me to 
> [https://hadoop.apache.org/hive/mailing_lists.html] which is inaccessible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-28420) HowToContribute Wiki page uses a non-existent URL

2024-07-31 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-28420:
--

Assignee: Stamatis Zampetakis

> HowToContribute Wiki page uses a non-existent URL
> -
>
> Key: HIVE-28420
> URL: https://issues.apache.org/jira/browse/HIVE-28420
> Project: Hive
>  Issue Type: Bug
>Reporter: Qiheng He
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> - HowToContribute Wiki page uses a non-existent URL. See 
> [https://cwiki.apache.org/confluence/display/Hive/HowToContribute] .
> {code:bash}
> Stay Involved
> Contributors should join the Hive mailing lists. In particular the dev list 
> (to join discussions of changes) and the user list (to help others).
> {code}
>  - Clicking on `Hive mailing lists` takes me to 
> [https://hadoop.apache.org/hive/mailing_lists.html] which is inaccessible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28401) Drop redundant XML test report post-processing from CI pipeline

2024-07-31 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869880#comment-17869880
 ] 

Stamatis Zampetakis commented on HIVE-28401:


I was checking the content of  test-results.tgz that was generated after 
running precommit tests for #5364 and it seems that a few .xml files still 
contain system-out and system-err entries. All these entries correspond to 
tests that are *skipped*.

The above findings show that the XML post-processing step is not completely 
redundant. If we remove it now, then it means that we will be storing a bit 
more information in the test-results. However, the difference of the test 
results archive size between 
[PR-5364|https://ci.hive.apache.org/job/hive-precommit/job/PR-5364/2/artifact/test-results.tgz]
 (21.75MB) and 
[master|https://ci.hive.apache.org/job/hive-precommit/job/master/2238/artifact/test-results.tgz]
 (20.35MB) is subtle. When compressed the system-out/err from skipped tests 
consume 6.5% (~1.5MB) more space.

I feel that removing the custom logic at the expense of 1.5MB of extra space is 
an acceptable trade-off given that we will get rid of the "Huge input lookup" 
error. [~asolimando] since this information was not present when you initially 
reviewed the PR I would like to hear your thoughts about this.

Alternatively, for having an ISO behavior with before we could wait and hope 
for SUREFIRE-2254 to be implemented although I don't find it necessary.

> Drop redundant XML test report post-processing from CI pipeline
> ---
>
> Key: HIVE-28401
> URL: https://issues.apache.org/jira/browse/HIVE-28401
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>
> The [Maven Surefire 
> plugin|https://maven.apache.org/surefire/maven-surefire-plugin/#maven-surefire-plugin]
>  generates an XML report containing various information regarding the 
> execution of tests. In case of failures the system-out and system-err output 
> from the test is saved in the XML file.
> The Jenkins pipeline has a post-processing 
> [step|https://github.com/apache/hive/blob/78f577d73e5a49ca0f8f1dcae721f3980162872a/Jenkinsfile#L380]
>  that attempts to remove the system-out and system-err entries from the XML 
> files generated by Surefire for all tests that passed as an attempt to save 
> disk space in the Jenkins node.
> {code:bash}
> # removes all stdout and err for passed tests
> xmlstarlet ed -L -d 'testsuite/testcase/system-out[count(../failure)=0]' -d 
> 'testsuite/testcase/system-err[count(../failure)=0]' 
> {code}
> This cleanup step is not necessary since Surefire (3.0.0-M4) is not storing 
> system-out and system-err for tests that passed. 
> Moreover, when the XML report file is large xmlstarlet chokes and throws a 
> "Huge input lookup" error that skips the remaining post-processing steps and 
> makes the build fail.
> {noformat}
> [2024-07-23T16:11:26.052Z] 
> ./itests/qtest/target/surefire-reports/TEST-org.apache.hadoop.hive.cli.split31.TestMiniLlapLocalCliDriver.xml:53539.2:
>  internal error: Huge input lookup
> [2024-07-23T16:11:26.053Z] 2024-07-23T09:02:51,799  INFO 
> [734aa572-f1e1-4376-8c1c-9666c216e579 main] Sessio
> [2024-07-23T16:11:26.053Z]  ^
> [2024-07-23T16:11:43.133Z] Recording test results
> [2024-07-23T16:11:50.785Z] [Checks API] No suitable checks publisher found.
> script returned exit code 3
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28415) Disable Develocity build scans when not used

2024-07-29 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-28415:
--

 Summary: Disable Develocity build scans when not used
 Key: HIVE-28415
 URL: https://issues.apache.org/jira/browse/HIVE-28415
 Project: Hive
  Issue Type: Task
  Components: Build Infrastructure
Reporter: Stamatis Zampetakis


Develocity build scans have been introduced by HIVE-28303 and now they run on 
every invocation of the mvn command. However, the results of the scans are only 
published in 
[ge.apache.org|https://ge.apache.org/scans?search.relativeStartTime=P28D&search.rootProjectNames=hive]
 by very specific [CI 
actions|https://github.com/apache/hive/blob/09553fca66ff69ff870c8a181750b70d81a8640e/.github/workflows/build.yml#L31].

The build analysis adds a noticeable overhead to build times and resources (CPU 
& memory) so it shouldn't be active by default cause most of the time it is not 
used. The analysis should only take place when we want to publish the result 
(CI action) or if it is explicitly requested by a developer.

The build scans were also responsible for some OOM errors in CI (HIVE-28402) 
since they require more memory than a regular build.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28402) Precommit tests fail with OOM when running split-19

2024-07-29 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869325#comment-17869325
 ] 

Stamatis Zampetakis commented on HIVE-28402:


Ι can reproduce the problem locally by running the following command:

{noformat}
export MAVEN_OPTS=-Xmx2g
mvn test 
-Dtest=TestExchangePartitions,TestAlterPartitions,TestFunctions,TestGetPartitions
 -s ~/.m2/settings.xml -Dtest.groups=
{noformat}

Using the ps command I could see that the main maven {{Launcher}} process was 
occupying 2.5GB of RSS memory before hitting an OOM. Note that the {{Launcher}} 
process is the one performing the build and analysis and not the one running 
the tests.
{noformat}
$ ps -eo pid,rss,cmd | grep Launcher
 103750 2532640 /opt/jdks/jdk1.8.0_261/bin/java -Xmx2g -classpath 
/home/stamatis/Programs/apache-maven-3.6.3/boot/plexus-classworlds-2.6.0.jar 
-Dclassworlds.conf=/home/stamatis/Programs/apache-maven-3.6.3/bin/m2.conf 
-Dmaven.home=/home/stamatis/Programs/apache-maven-3.6.3 
-Dlibrary.jansi.path=/home/stamatis/Programs/apache-maven-3.6.3/lib/jansi-native
 -Dmaven.multiModuleProjectDirectory=/home/stamatis/Projects/Apache/hive 
org.codehaus.plexus.classworlds.launcher.Launcher test 
-Dtest=TestExchangePartitions,TestAlterPartitions,TestFunctions,TestGetPartitions
 -s /home/stamatis/.m2/settings.xml -Dtest.groups=
{noformat}
Note that if heap is not restricted using MAVEN_OPTS the build will pass since 
heap is able to grow more.

As Zhihua mentioned, the heap dump analysis shows that we have 2.4M instances 
of com.gradle.scan.eventmodel.maven.MvnTestOutput_1_0 class which retain ~1.2GB 
of heap space. The majority of space is occupied by the [message 
content|https://docs.gradle.com/enterprise/event-model-javadoc/com/gradle/scan/eventmodel/maven/MvnTestOutput_1_0.html]
 of the event. From the Javadoc of this class "An EventData that is published 
when a test writes to the standard output or standard error during test 
execution." we can infer that standard out/err are generating events and till 
we publish them they will remain in memory. The latter means that tests which 
write lots of things to system out/err will accumulate in the heap.


> Precommit tests fail with OOM when running split-19
> ---
>
> Key: HIVE-28402
> URL: https://issues.apache.org/jira/browse/HIVE-28402
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-07-29-18-06-33-250.png, 
> image-2024-07-29-18-11-37-046.png, image-2024-07-29-18-17-58-271.png
>
>
> The last 3 runs in master all fail with OOM when running split-19:
>  * 
> [https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/2233/pipeline]
>  * 
> [https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/2234/pipeline]
>  * 
> [https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/2235/pipeline]
> {noformat}
> [2024-07-25T05:57:46.816Z] [INFO] Running 
> org.apache.hadoop.hive.metastore.client.TestGetPartitions
> [2024-07-25T06:00:23.926Z] Exception in thread "Thread-46" 
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> [2024-07-25T06:00:23.926Z]at 
> java.util.Arrays.copyOfRange(Arrays.java:3664)
> [2024-07-25T06:00:23.926Z]at java.lang.String.(String.java:207)
> [2024-07-25T06:00:23.926Z]at 
> java.io.BufferedReader.readLine(BufferedReader.java:356)
> [2024-07-25T06:00:24.907Z]at 
> java.io.BufferedReader.readLine(BufferedReader.java:389)
> [2024-07-25T06:00:24.907Z]at 
> org.apache.maven.surefire.shade.common.org.apache.maven.shared.utils.cli.StreamPumper.run(StreamPumper.java:89)
> [2024-07-25T06:01:46.664Z] [WARNING] ForkStarter IOException: GC overhead 
> limit exceeded. See the dump file 
> /home/jenkins/agent/workspace/hive-precommit_master/standalone-metastore/metastore-server/target/surefire-reports/2024-07-25T05-50-11_022-jvmRun1.dumpstream
> [2024-07-25T06:01:55.003Z] [INFO] Running 
> org.apache.hadoop.hive.metastore.TestFilterHooks
> [2024-07-25T06:02:21.747Z] 
> [2024-07-25T06:02:21.748Z] Exception: java.lang.OutOfMemoryError thrown from 
> the UncaughtExceptionHandler in thread "Thread-49"
> [2024-07-25T06:03:08.707Z] [WARNING] ForkStarter IOException: GC overhead 
> limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC

[jira] [Commented] (HIVE-28402) Precommit tests fail with OOM when running split-19

2024-07-29 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869282#comment-17869282
 ] 

Stamatis Zampetakis commented on HIVE-28402:


Many thanks for working on this [~dengzh]. Can you please add a few details 
here about the root cause and the proposed solution for future reference.

> Precommit tests fail with OOM when running split-19
> ---
>
> Key: HIVE-28402
> URL: https://issues.apache.org/jira/browse/HIVE-28402
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>
> The last 3 runs in master all fail with OOM when running split-19:
>  * 
> [https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/2233/pipeline]
>  * 
> [https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/2234/pipeline]
>  * 
> [https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/2235/pipeline]
> {noformat}
> [2024-07-25T05:57:46.816Z] [INFO] Running 
> org.apache.hadoop.hive.metastore.client.TestGetPartitions
> [2024-07-25T06:00:23.926Z] Exception in thread "Thread-46" 
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> [2024-07-25T06:00:23.926Z]at 
> java.util.Arrays.copyOfRange(Arrays.java:3664)
> [2024-07-25T06:00:23.926Z]at java.lang.String.(String.java:207)
> [2024-07-25T06:00:23.926Z]at 
> java.io.BufferedReader.readLine(BufferedReader.java:356)
> [2024-07-25T06:00:24.907Z]at 
> java.io.BufferedReader.readLine(BufferedReader.java:389)
> [2024-07-25T06:00:24.907Z]at 
> org.apache.maven.surefire.shade.common.org.apache.maven.shared.utils.cli.StreamPumper.run(StreamPumper.java:89)
> [2024-07-25T06:01:46.664Z] [WARNING] ForkStarter IOException: GC overhead 
> limit exceeded. See the dump file 
> /home/jenkins/agent/workspace/hive-precommit_master/standalone-metastore/metastore-server/target/surefire-reports/2024-07-25T05-50-11_022-jvmRun1.dumpstream
> [2024-07-25T06:01:55.003Z] [INFO] Running 
> org.apache.hadoop.hive.metastore.TestFilterHooks
> [2024-07-25T06:02:21.747Z] 
> [2024-07-25T06:02:21.748Z] Exception: java.lang.OutOfMemoryError thrown from 
> the UncaughtExceptionHandler in thread "Thread-49"
> [2024-07-25T06:03:08.707Z] [WARNING] ForkStarter IOException: GC overhead 
> limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded. See the dump file 
> /home/jenkins/agent/workspace/hive-precommit_master/standalone-metastore/metastore-server/target/surefire-reports/2024-07-25T05-50-11_022-jvmRun1.dumpstream
> [2024-07-25T06:03:15.362Z] [ERROR] Error closing test event listener:
> [2024-07-25T06:03:15.362Z] java.util.concurrent.CompletionException: 
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> [2024-07-25T06:03:15.362Z] at 
> java.util.concurrent.CompletableFuture.encodeThrowable 
> (CompletableFuture.java:273)
> [2024-07-25T06:03:15.362Z] at 
> java.util.concurrent.CompletableFuture.completeThrowable 
> (CompletableFuture.java:280)
> [2024-07-25T06:03:15.362Z] at 
> java.util.concurrent.CompletableFuture$AsyncRun.run 
> (CompletableFuture.java:1643)
> [2024-07-25T06:03:15.362Z] at 
> java.util.concurrent.ThreadPoolExecutor.runWorker 
> (ThreadPoolExecutor.java:1149)
> [2024-07-25T06:03:15.362Z] at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run 
> (ThreadPoolExecutor.java:624)
> [2024-07-25T06:03:15.362Z] at java.lang.Thread.run (Thread.java:748)
> [2024-07-25T06:03:15.362Z] Caused by: java.lang.OutOfMemoryError: GC overhead 
> limit exceeded
> [2024-07-25T06:03:15.363Z] [ERROR] GC overhead limit exceeded -> [Help 1]
> {noformat}
> The OOM is also affecting PR runs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28403) Delete redundant Javadoc for Hive

2024-07-26 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868899#comment-17868899
 ] 

Stamatis Zampetakis commented on HIVE-28403:


We are happy to have you on board.

Just to clarify a JIRA is required for every (PR) contribution. I just wanted 
to highlight that we should be mindful when opening JIRAs/PRs and weight the 
pros/cons.

> Delete redundant Javadoc for Hive
> -
>
> Key: HIVE-28403
> URL: https://issues.apache.org/jira/browse/HIVE-28403
> Project: Hive
>  Issue Type: Wish
>Reporter: Caican Cai
>Priority: Minor
>  Labels: pull-request-available
> Fix For: Not Applicable
>
>
> Hive has some redundant Javadoc, but there are no comments in it. I think 
> some Javadoc can be deleted.
> {code:java}
> // Some comments here
>   /**
>*
>*/
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28403) Delete redundant Javadoc for Hive

2024-07-26 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868897#comment-17868897
 ] 

Stamatis Zampetakis commented on HIVE-28403:


Hey [~caicancai], thanks for working on this. I really appreciate the time that 
you took in contributing this PR but I feel that these kind of changes have 
more negatives than positives for the project.

+Negatives:+
* Consume CI resources (runs are limited in Hive so this PR may block others 
from running)
* Increase likelihood of merge conflicts during backports
* Consume reviewers time (for checking and merging)
* Consume contributors time (they could spend their time on more impactful 
changes)
* Additional JIRA/git/mailing list traffic

+Positives:+
* Minor reduce in code size
* Other?

The above is my personal view point for such contributions and it does not mean 
that everyone in the Hive community agrees. I tend to avoid merging such 
contributions cause I have a limited amount of time and would like to focus on 
more impactful changes but other reviewers may be willing to get this in.

> Delete redundant Javadoc for Hive
> -
>
> Key: HIVE-28403
> URL: https://issues.apache.org/jira/browse/HIVE-28403
> Project: Hive
>  Issue Type: Wish
>Reporter: Caican Cai
>Priority: Minor
>  Labels: pull-request-available
>
> Hive has some redundant Javadoc, but there are no comments in it. I think 
> some Javadoc can be deleted.
> {code:java}
> // Some comments here
>   /**
>*
>*/
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26332) Upgrade maven-surefire-plugin to 3.3.1

2024-07-25 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868612#comment-17868612
 ] 

Stamatis Zampetakis commented on HIVE-26332:


Hey [~michael-o] apologies for the delay. I saw that SUREFIRE-1934 is already 
released with 3.3.1 so I tested with that version. After setting 
{{enableOutErrElements}} to false things seem to work as fine on some tests 
that I performed locally (my personal opinion is that false should be the 
default but this is another discussion). I now updated the PR upgrading to 
3.3.1 and will see how it goes for the full test run.

> Upgrade maven-surefire-plugin to 3.3.1
> --
>
> Key: HIVE-26332
> URL: https://issues.apache.org/jira/browse/HIVE-26332
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently we use 3.0.0-M4 which was released in 2019. Since there have been 
> multiple bug fixes and improvements.
> Worth mentioning that interaction with JUnit5 is much more mature as well and 
> this is one of the main reasons driving this upgrade.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26332) Upgrade maven-surefire-plugin to 3.3.1

2024-07-25 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-26332:
---
Summary: Upgrade maven-surefire-plugin to 3.3.1  (was: Upgrade 
maven-surefire-plugin to 3.2.5)

> Upgrade maven-surefire-plugin to 3.3.1
> --
>
> Key: HIVE-26332
> URL: https://issues.apache.org/jira/browse/HIVE-26332
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently we use 3.0.0-M4 which was released in 2019. Since there have been 
> multiple bug fixes and improvements.
> Worth mentioning that interaction with JUnit5 is much more mature as well and 
> this is one of the main reasons driving this upgrade.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28402) Precommit tests fail with OOM when running split-19

2024-07-25 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-28402:
--

 Summary: Precommit tests fail with OOM when running split-19
 Key: HIVE-28402
 URL: https://issues.apache.org/jira/browse/HIVE-28402
 Project: Hive
  Issue Type: Task
  Components: Testing Infrastructure
Reporter: Stamatis Zampetakis


The last 3 runs in master all fail with OOM when running split-19:
 * 
[https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/2233/pipeline]
 * 
[https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/2234/pipeline]
 * 
[https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/2235/pipeline]

{noformat}
[2024-07-25T05:57:46.816Z] [INFO] Running 
org.apache.hadoop.hive.metastore.client.TestGetPartitions
[2024-07-25T06:00:23.926Z] Exception in thread "Thread-46" 
java.lang.OutOfMemoryError: GC overhead limit exceeded
[2024-07-25T06:00:23.926Z]  at 
java.util.Arrays.copyOfRange(Arrays.java:3664)
[2024-07-25T06:00:23.926Z]  at java.lang.String.(String.java:207)
[2024-07-25T06:00:23.926Z]  at 
java.io.BufferedReader.readLine(BufferedReader.java:356)
[2024-07-25T06:00:24.907Z]  at 
java.io.BufferedReader.readLine(BufferedReader.java:389)
[2024-07-25T06:00:24.907Z]  at 
org.apache.maven.surefire.shade.common.org.apache.maven.shared.utils.cli.StreamPumper.run(StreamPumper.java:89)
[2024-07-25T06:01:46.664Z] [WARNING] ForkStarter IOException: GC overhead limit 
exceeded. See the dump file 
/home/jenkins/agent/workspace/hive-precommit_master/standalone-metastore/metastore-server/target/surefire-reports/2024-07-25T05-50-11_022-jvmRun1.dumpstream
[2024-07-25T06:01:55.003Z] [INFO] Running 
org.apache.hadoop.hive.metastore.TestFilterHooks
[2024-07-25T06:02:21.747Z] 
[2024-07-25T06:02:21.748Z] Exception: java.lang.OutOfMemoryError thrown from 
the UncaughtExceptionHandler in thread "Thread-49"
[2024-07-25T06:03:08.707Z] [WARNING] ForkStarter IOException: GC overhead limit 
exceeded
[2024-07-25T06:03:08.707Z] GC overhead limit exceeded
[2024-07-25T06:03:08.707Z] GC overhead limit exceeded
[2024-07-25T06:03:08.707Z] GC overhead limit exceeded
[2024-07-25T06:03:08.707Z] GC overhead limit exceeded
[2024-07-25T06:03:08.707Z] GC overhead limit exceeded
[2024-07-25T06:03:08.707Z] GC overhead limit exceeded
[2024-07-25T06:03:08.707Z] GC overhead limit exceeded
[2024-07-25T06:03:08.707Z] GC overhead limit exceeded
[2024-07-25T06:03:08.707Z] GC overhead limit exceeded
[2024-07-25T06:03:08.707Z] GC overhead limit exceeded
[2024-07-25T06:03:08.707Z] GC overhead limit exceeded
[2024-07-25T06:03:08.707Z] GC overhead limit exceeded
[2024-07-25T06:03:08.707Z] GC overhead limit exceeded
[2024-07-25T06:03:08.707Z] GC overhead limit exceeded
[2024-07-25T06:03:08.707Z] GC overhead limit exceeded
[2024-07-25T06:03:08.707Z] GC overhead limit exceeded. See the dump file 
/home/jenkins/agent/workspace/hive-precommit_master/standalone-metastore/metastore-server/target/surefire-reports/2024-07-25T05-50-11_022-jvmRun1.dumpstream
[2024-07-25T06:03:15.362Z] [ERROR] Error closing test event listener:
[2024-07-25T06:03:15.362Z] java.util.concurrent.CompletionException: 
java.lang.OutOfMemoryError: GC overhead limit exceeded
[2024-07-25T06:03:15.362Z] at 
java.util.concurrent.CompletableFuture.encodeThrowable 
(CompletableFuture.java:273)
[2024-07-25T06:03:15.362Z] at 
java.util.concurrent.CompletableFuture.completeThrowable 
(CompletableFuture.java:280)
[2024-07-25T06:03:15.362Z] at 
java.util.concurrent.CompletableFuture$AsyncRun.run 
(CompletableFuture.java:1643)
[2024-07-25T06:03:15.362Z] at 
java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1149)
[2024-07-25T06:03:15.362Z] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624)
[2024-07-25T06:03:15.362Z] at java.lang.Thread.run (Thread.java:748)
[2024-07-25T06:03:15.362Z] Caused by: java.lang.OutOfMemoryError: GC overhead 
limit exceeded
[2024-07-25T06:03:15.363Z] [ERROR] GC overhead limit exceeded -> [Help 1]
{noformat}

The OOM is also affecting PR runs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26369) Hive Insert Overwrite causing Data duplication

2024-07-25 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868595#comment-17868595
 ] 

Stamatis Zampetakis commented on HIVE-26369:


[~pengbei] I haven't worked on this so don't know sorry.

> Hive Insert Overwrite causing Data duplication
> --
>
> Key: HIVE-26369
> URL: https://issues.apache.org/jira/browse/HIVE-26369
> Project: Hive
>  Issue Type: Bug
>Reporter: Jayram Kumar
>Priority: Critical
>
> Hive Insert Overwrite is causing Data Duplication. When there is an exception 
> while writing the file and it gets retried, the existing state does not get 
> cleaned up. It causes duplication in output. 
> It happens when the following exception is triggered.
> {code:java}
> java.io.IOException: java.lang.reflect.InvocationTargetException
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:271)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:144)
>   at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:277)
>   at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:214)
>   at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>   at 
> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:83)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>   at 
> org.apache.spark.SparkContext$$anonfun$34.apply(SparkContext.scala:2185)
>   at 
> org.apache.spark.SparkContext$$anonfun$34.apply(SparkContext.scala:2185)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>   at org.apache.spark.scheduler.Task.run(Task.scala:109)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.GeneratedConstructorAccessor15.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:257)
>   ... 20 more
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RetriableException):
>  Server too busy
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1500)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1446)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1356)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy20.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:812)
>   at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> org.apache.hadoop.io.retry.

[jira] [Work started] (HIVE-28401) Drop redundant XML test report post-processing from CI pipeline

2024-07-24 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-28401 started by Stamatis Zampetakis.
--
> Drop redundant XML test report post-processing from CI pipeline
> ---
>
> Key: HIVE-28401
> URL: https://issues.apache.org/jira/browse/HIVE-28401
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> The [Maven Surefire 
> plugin|https://maven.apache.org/surefire/maven-surefire-plugin/#maven-surefire-plugin]
>  generates an XML report containing various information regarding the 
> execution of tests. In case of failures the system-out and system-err output 
> from the test is saved in the XML file.
> The Jenkins pipeline has a post-processing 
> [step|https://github.com/apache/hive/blob/78f577d73e5a49ca0f8f1dcae721f3980162872a/Jenkinsfile#L380]
>  that attempts to remove the system-out and system-err entries from the XML 
> files generated by Surefire for all tests that passed as an attempt to save 
> disk space in the Jenkins node.
> {code:bash}
> # removes all stdout and err for passed tests
> xmlstarlet ed -L -d 'testsuite/testcase/system-out[count(../failure)=0]' -d 
> 'testsuite/testcase/system-err[count(../failure)=0]' 
> {code}
> This cleanup step is not necessary since Surefire (3.0.0-M4) is not storing 
> system-out and system-err for tests that passed. 
> Moreover, when the XML report file is large xmlstarlet chokes and throws a 
> "Huge input lookup" error that skips the remaining post-processing steps and 
> makes the build fail.
> {noformat}
> [2024-07-23T16:11:26.052Z] 
> ./itests/qtest/target/surefire-reports/TEST-org.apache.hadoop.hive.cli.split31.TestMiniLlapLocalCliDriver.xml:53539.2:
>  internal error: Huge input lookup
> [2024-07-23T16:11:26.053Z] 2024-07-23T09:02:51,799  INFO 
> [734aa572-f1e1-4376-8c1c-9666c216e579 main] Sessio
> [2024-07-23T16:11:26.053Z]  ^
> [2024-07-23T16:11:43.133Z] Recording test results
> [2024-07-23T16:11:50.785Z] [Checks API] No suitable checks publisher found.
> script returned exit code 3
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28401) Drop redundant XML test report post-processing from CI pipeline

2024-07-24 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-28401:
--

 Summary: Drop redundant XML test report post-processing from CI 
pipeline
 Key: HIVE-28401
 URL: https://issues.apache.org/jira/browse/HIVE-28401
 Project: Hive
  Issue Type: Task
  Components: Testing Infrastructure
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


The [Maven Surefire 
plugin|https://maven.apache.org/surefire/maven-surefire-plugin/#maven-surefire-plugin]
 generates an XML report containing various information regarding the execution 
of tests. In case of failures the system-out and system-err output from the 
test is saved in the XML file.

The Jenkins pipeline has a post-processing 
[step|https://github.com/apache/hive/blob/78f577d73e5a49ca0f8f1dcae721f3980162872a/Jenkinsfile#L380]
 that attempts to remove the system-out and system-err entries from the XML 
files generated by Surefire for all tests that passed as an attempt to save 
disk space in the Jenkins node.

{code:bash}
# removes all stdout and err for passed tests
xmlstarlet ed -L -d 'testsuite/testcase/system-out[count(../failure)=0]' -d 
'testsuite/testcase/system-err[count(../failure)=0]' 
{code}

This cleanup step is not necessary since Surefire (3.0.0-M4) is not storing 
system-out and system-err for tests that passed. 

Moreover, when the XML report file is large xmlstarlet chokes and throws a 
"Huge input lookup" error that skips the remaining post-processing steps and 
makes the build fail.

{noformat}
[2024-07-23T16:11:26.052Z] 
./itests/qtest/target/surefire-reports/TEST-org.apache.hadoop.hive.cli.split31.TestMiniLlapLocalCliDriver.xml:53539.2:
 internal error: Huge input lookup
[2024-07-23T16:11:26.053Z] 2024-07-23T09:02:51,799  INFO 
[734aa572-f1e1-4376-8c1c-9666c216e579 main] Sessio
[2024-07-23T16:11:26.053Z]  ^
[2024-07-23T16:11:43.133Z] Recording test results
[2024-07-23T16:11:50.785Z] [Checks API] No suitable checks publisher found.
script returned exit code 3
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-25952) Drop HiveRelMdPredicates::getPredicates(Project...) to use that of RelMdPredicates

2024-07-23 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868076#comment-17868076
 ] 

Stamatis Zampetakis commented on HIVE-25952:


I created a new PR (#5360) to gauge the impact on existing test cases. I would 
like to advance this work mainly to avoid potential wrong result issues due to 
the discrepancies outlined under HIVE-26733.

[~asolimando] I feel that HIVE-25966 is redundant. Is this really a blocker for 
this ticket? If not I guess we can close HIVE-25966 as won't fix.

I like the analysis of differences in the description of this ticket. I agree 
with everything except the line describing the behavior for {{RexCall}} 
expression. It seems that both in Hive and Calcite the arguments of the call 
are checked. In Hive the check is incomplete/wrong cause the last argument of 
the call determines the result, while in Calcite all call arguments must be 
constants.



> Drop HiveRelMdPredicates::getPredicates(Project...) to use that of 
> RelMdPredicates
> --
>
> Key: HIVE-25952
> URL: https://issues.apache.org/jira/browse/HIVE-25952
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> There are some differences on this method between Hive and Calcite, the idea 
> of this ticket is to unify the two methods, and then drop the override in 
> HiveRelMdPredicates in favour of the method of RelMdPredicates.
> After applying HIVE-25966, the only difference is in the test for constant 
> expressions, which can be summarized as follows:
> ||Expression Type|Is Constant for Hive?||Is Constant for Calcite?||
> |InputRef|False|False|
> |Call|True if function is deterministic (arguments are not checked), false 
> otherwise|True if function is deterministic and all operands are constants, 
> false otherwise|
> |CorrelatedVariable|False|False|
> |LocalRef|False|False|
> |Over|False|False|
> |DymanicParameter|False|True|
> |RangeRef|False|False|
> |FieldAccess|False|Given expr.field, true if expr is constant, false 
> otherwise|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28359) Discard old builds in Jenkins to avoid disk space exhaustion

2024-07-19 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867257#comment-17867257
 ] 

Stamatis Zampetakis commented on HIVE-28359:


As of now, we retain only the last 5 builds for each PR for at most 2 months. 
For the master branch we keep all builds for at least one year.

> Discard old builds in Jenkins to avoid disk space exhaustion
> 
>
> Key: HIVE-28359
> URL: https://issues.apache.org/jira/browse/HIVE-28359
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
> Attachments: builds.txt
>
>
> Currently Jenkins retains the builds from all active branches/PRs. 
> {code:bash}
> for b in `find var/jenkins_home/jobs -name "builds"`; do echo -n $b" " ; ls 
> -l $b | wc -l; done | sort -k2 -rn > builds.txt
> {code}
> Some PRs (e.g., 
> [PR-5216|https://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-5216/])
>  with an excessive number of builds (i.e., 66) can easily consume many GBs of 
> data (PR-5216 uses 13GB for the builds). The first build for PR-5216 was 
> saved on April 26, 2024 and it is now more than 2 months old.
> For master, we currently have all builds since January 2023 (previous builds 
> where manually removed as part of HIVE-28013). The builds for master occupy 
> currently 50GB of space.
> Due to the above the disk space (persistent volume) cannot be reclaimed and 
> currently it is almost full (91% /var/jenkins_home).
> {noformat}
> kubectl exec jenkins-6858ddb664-l4xfg -- bash -c "df"
> Filesystem 1K-blocks  Used Available Use% Mounted on
> overlay 98831908   4675004  94140520   5% /
> tmpfs  65536 0 65536   0% /dev
> tmpfs6645236 0   6645236   0% /sys/fs/cgroup
> /dev/sdb   308521792 278996208  29509200  91% /var/jenkins_home
> /dev/sda1   98831908   4675004  94140520   5% /etc/hosts
> shm65536 0 65536   0% /dev/shm
> tmpfs   1080112812  10801116   1% 
> /run/secrets/kubernetes.io/serviceaccount
> tmpfs6645236 0   6645236   0% /proc/acpi
> tmpfs6645236 0   6645236   0% /proc/scsi
> tmpfs6645236 0   6645236   0% /sys/firmware
> {noformat}
> Without a discard policy in place we are going to hit again HIVE-28013 or 
> other disk related issues pretty soon.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28359) Discard old builds in Jenkins to avoid disk space exhaustion

2024-07-19 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-28359.

Fix Version/s: 4.1.0
   Resolution: Fixed

Fixed in 
[4835968fcdf44ec759f91dbeafec71bf059de42e|https://github.com/apache/hive/commit/4835968fcdf44ec759f91dbeafec71bf059de42e].
 Thanks for the review [~kgyrtkirk]!

> Discard old builds in Jenkins to avoid disk space exhaustion
> 
>
> Key: HIVE-28359
> URL: https://issues.apache.org/jira/browse/HIVE-28359
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
> Attachments: builds.txt
>
>
> Currently Jenkins retains the builds from all active branches/PRs. 
> {code:bash}
> for b in `find var/jenkins_home/jobs -name "builds"`; do echo -n $b" " ; ls 
> -l $b | wc -l; done | sort -k2 -rn > builds.txt
> {code}
> Some PRs (e.g., 
> [PR-5216|https://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-5216/])
>  with an excessive number of builds (i.e., 66) can easily consume many GBs of 
> data (PR-5216 uses 13GB for the builds). The first build for PR-5216 was 
> saved on April 26, 2024 and it is now more than 2 months old.
> For master, we currently have all builds since January 2023 (previous builds 
> where manually removed as part of HIVE-28013). The builds for master occupy 
> currently 50GB of space.
> Due to the above the disk space (persistent volume) cannot be reclaimed and 
> currently it is almost full (91% /var/jenkins_home).
> {noformat}
> kubectl exec jenkins-6858ddb664-l4xfg -- bash -c "df"
> Filesystem 1K-blocks  Used Available Use% Mounted on
> overlay 98831908   4675004  94140520   5% /
> tmpfs  65536 0 65536   0% /dev
> tmpfs6645236 0   6645236   0% /sys/fs/cgroup
> /dev/sdb   308521792 278996208  29509200  91% /var/jenkins_home
> /dev/sda1   98831908   4675004  94140520   5% /etc/hosts
> shm65536 0 65536   0% /dev/shm
> tmpfs   1080112812  10801116   1% 
> /run/secrets/kubernetes.io/serviceaccount
> tmpfs6645236 0   6645236   0% /proc/acpi
> tmpfs6645236 0   6645236   0% /proc/scsi
> tmpfs6645236 0   6645236   0% /sys/firmware
> {noformat}
> Without a discard policy in place we are going to hit again HIVE-28013 or 
> other disk related issues pretty soon.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28376) Remove unused Hive object from RelOptHiveTable

2024-07-17 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-28376:
--

 Summary: Remove unused Hive object from RelOptHiveTable
 Key: HIVE-28376
 URL: https://issues.apache.org/jira/browse/HIVE-28376
 Project: Hive
  Issue Type: Task
  Components: CBO
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


The 
[Hive|https://github.com/apache/hive/blob/b18d5732b4f309fdc3b8226847c9c1ebcd2476fd/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java]
 object is not used inside RelOptHiveTable so keeping a reference to it is 
wasting memory and also complicates creation of RelOptHiveTable objects 
(constructor parameter). 

Moreover, the Hive objects have thread local scope so in general they shouldn't 
be passed around cause their lifecycle becomes harder to manage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28314) Support non-boolean WHERE conditions in CBO

2024-07-17 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-28314.

Fix Version/s: 4.1.0
   Resolution: Fixed

Fixed in 
[b18d5732b4f309fdc3b8226847c9c1ebcd2476fd|https://github.com/apache/hive/commit/b18d5732b4f309fdc3b8226847c9c1ebcd2476fd].
 Thanks for the PR [~soumyakanti.das]!

> Support non-boolean WHERE conditions in CBO
> ---
>
> Key: HIVE-28314
> URL: https://issues.apache.org/jira/browse/HIVE-28314
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> h3. Filter expression with non-boolean return type
> fname=annotate_stats_filter.q
> {code:sql}
> explain select * from loc_orc where 'foo' 
> {code}
> {noformat}
> org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Filter 
> expression with non-boolean return type.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28314) Support non-boolean WHERE conditions in CBO

2024-07-17 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-28314:
---
Summary: Support non-boolean WHERE conditions in CBO  (was: Support 
literals as filter expression with non-boolean return type)

> Support non-boolean WHERE conditions in CBO
> ---
>
> Key: HIVE-28314
> URL: https://issues.apache.org/jira/browse/HIVE-28314
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>
> h3. Filter expression with non-boolean return type
> fname=annotate_stats_filter.q
> {code:sql}
> explain select * from loc_orc where 'foo' 
> {code}
> {noformat}
> org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Filter 
> expression with non-boolean return type.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28321) Support select alias in the having clause for CBO

2024-07-16 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866341#comment-17866341
 ] 

Stamatis Zampetakis commented on HIVE-28321:


This is kinda a revert of HIVE-8194. Unfortunately, there is no much background 
why in HIVE-8194 they opted to not support alias in the HAVING clause apart 
from the fact that it is not standard behavior. 

At the moment various DBMS support this feature and there are various 
primitives in Calcite as well (CALCITE-1306) so it makes sense to make CBO 
handle this case.

> Support select alias in the having clause for CBO
> -
>
> Key: HIVE-28321
> URL: https://issues.apache.org/jira/browse/HIVE-28321
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
>
> fname=limit_pushdown_negative.q
> {code:sql}
> explain select value, sum(key) as sum from src group by value having sum > 
> 100 limit 20
> {code}
> {noformat}
> org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: 
> Encountered Select alias 'sum' in having clause 'sum > 100' This non standard 
> behavior is not supported with cbo on. Turn off cbo for these queries.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28339) Upgrade Jenkins version in CI from 2.332.3 to 2.452.2

2024-07-15 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866039#comment-17866039
 ] 

Stamatis Zampetakis commented on HIVE-28339:


The upgrade was initiated to address CVE-2024-23897 (among other CVEs) which 
affected ci.hive.apache.org. For more details, please check: 
https://lists.apache.org/thread/hrfo4x4tylpvf3q25ro6gys64cmcvyjz

> Upgrade Jenkins version in CI from 2.332.3 to 2.452.2
> -
>
> Key: HIVE-28339
> URL: https://issues.apache.org/jira/browse/HIVE-28339
> Project: Hive
>  Issue Type: Task
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>
> The Jenkins version that is used in [https://ci.hive.apache.org/] is 
> currently at [2.332.3|https://www.jenkins.io/changelog-stable/#v2.332.3] 
> which was released in 2022.
> The latest stable version at the moment is 
> [2.452.2|https://www.jenkins.io/changelog-stable/#v2.452.2] and contains many 
> improvements, bug and CVE fixes.
> The Dockerfile that is used to build the Jenkins file can be found here:
> [https://github.com/kgyrtkirk/hive-test-kube/blob/master/htk-jenkins/Dockerfile]
> The Kubernetes deployment files can be found here:
> [https://github.com/kgyrtkirk/hive-test-kube/tree/master/k8s]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28362) Fail to materialize a CTE with VOID

2024-07-08 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863714#comment-17863714
 ] 

Stamatis Zampetakis commented on HIVE-28362:


If we can infer a concrete type from the overall context of the query then that 
would be a nice improvement and definitely worth contributing in this area. 

On the other hand, I consider type derivation a different topic from supporting 
VOID/NULL type in DDLs.

> Fail to materialize a CTE with VOID
> ---
>
> Key: HIVE-28362
> URL: https://issues.apache.org/jira/browse/HIVE-28362
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 4.0.0
>Reporter: Shohei Okumiya
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: pull-request-available
>
> CTE materialization fails when it includes a NULL literal.
> {code:java}
> set hive.optimize.cte.materialize.full.aggregate.only=false;
> set hive.optimize.cte.materialize.threshold=2;
> WITH x AS (SELECT null AS null_value)
> SELECT * FROM x UNION ALL SELECT * FROM x; {code}
> Error message.
> {code:java}
> org.apache.hadoop.hive.ql.parse.SemanticException: CREATE-TABLE-AS-SELECT 
> creates a VOID type, please use CAST to specify the type, near field:  
> null_value
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8344)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8303)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7846)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11598)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:11461)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12397)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12263)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:638)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13136)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.materializeCTE(CalcitePlanner.java:1062)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2390)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2338)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2340)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2501)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2323)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:12978)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13085)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465)
>     at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:332)
>     at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
>     at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:109)
>     at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:508) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28362) Fail to materialize a CTE with VOID

2024-07-05 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863195#comment-17863195
 ] 

Stamatis Zampetakis commented on HIVE-28362:


Currently I don't think we support CTAS with VOID types so if we want to allow 
CTE materialization with VOID then we should  treat CTAS and other cases first.

> Fail to materialize a CTE with VOID
> ---
>
> Key: HIVE-28362
> URL: https://issues.apache.org/jira/browse/HIVE-28362
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 4.0.0
>Reporter: Shohei Okumiya
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: pull-request-available
>
> CTE materialization fails when it includes a NULL literal.
> {code:java}
> set hive.optimize.cte.materialize.full.aggregate.only=false;
> set hive.optimize.cte.materialize.threshold=2;
> WITH x AS (SELECT null AS null_value)
> SELECT * FROM x UNION ALL SELECT * FROM x; {code}
> Error message.
> {code:java}
> org.apache.hadoop.hive.ql.parse.SemanticException: CREATE-TABLE-AS-SELECT 
> creates a VOID type, please use CAST to specify the type, near field:  
> null_value
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8344)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8303)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7846)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11598)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:11461)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12397)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12263)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:638)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13136)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.materializeCTE(CalcitePlanner.java:1062)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2390)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2338)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2340)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2501)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2323)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:12978)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13085)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465)
>     at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:332)
>     at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
>     at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:109)
>     at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:508) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28362) Fail to materialize a CTE with VOID

2024-07-05 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863194#comment-17863194
 ] 

Stamatis Zampetakis commented on HIVE-28362:


I think the behavior is expected and not a bug. There is no possible way to 
infer the type of the "null" column so we cannot pick the correct type for 
materializing it. Please check HIVE-11217 for more context.

Even without materialization, the query is quite ambiguous since we cannot 
derive the type of the result.

> Fail to materialize a CTE with VOID
> ---
>
> Key: HIVE-28362
> URL: https://issues.apache.org/jira/browse/HIVE-28362
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 4.0.0
>Reporter: Shohei Okumiya
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: pull-request-available
>
> CTE materialization fails when it includes a NULL literal.
> {code:java}
> set hive.optimize.cte.materialize.full.aggregate.only=false;
> set hive.optimize.cte.materialize.threshold=2;
> WITH x AS (SELECT null AS null_value)
> SELECT * FROM x UNION ALL SELECT * FROM x; {code}
> Error message.
> {code:java}
> org.apache.hadoop.hive.ql.parse.SemanticException: CREATE-TABLE-AS-SELECT 
> creates a VOID type, please use CAST to specify the type, near field:  
> null_value
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8344)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8303)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7846)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11598)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:11461)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12397)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12263)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:638)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13136)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.materializeCTE(CalcitePlanner.java:1062)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2390)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2338)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2340)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2501)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2323)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:12978)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13085)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465)
>     at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:332)
>     at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
>     at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:109)
>     at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:508) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26332) Upgrade maven-surefire-plugin to 3.2.5

2024-07-05 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863155#comment-17863155
 ] 

Stamatis Zampetakis commented on HIVE-26332:


Currently our CI does not allow the use of 
https://repository.apache.org/snapshots/ so can't test this widely. I know what 
needs to be changed but don't have the permissions to do so. I asked for the 
necessary privileges and waiting to get them. 

> Upgrade maven-surefire-plugin to 3.2.5
> --
>
> Key: HIVE-26332
> URL: https://issues.apache.org/jira/browse/HIVE-26332
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently we use 3.0.0-M4 which was released in 2019. Since there have been 
> multiple bug fixes and improvements.
> Worth mentioning that interaction with JUnit5 is much more mature as well and 
> this is one of the main reasons driving this upgrade.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26332) Upgrade maven-surefire-plugin to 3.2.5

2024-07-03 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862807#comment-17862807
 ] 

Stamatis Zampetakis commented on HIVE-26332:


I started a full test run using surefire 3.1.3-SNAPSHOT for testing the state 
after resolving SUREFIRE-1934. Once the tests finish I will share the findings.

> Upgrade maven-surefire-plugin to 3.2.5
> --
>
> Key: HIVE-26332
> URL: https://issues.apache.org/jira/browse/HIVE-26332
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently we use 3.0.0-M4 which was released in 2019. Since there have been 
> multiple bug fixes and improvements.
> Worth mentioning that interaction with JUnit5 is much more mature as well and 
> this is one of the main reasons driving this upgrade.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28339) Upgrade Jenkins version in CI from 2.332.3 to 2.452.2

2024-07-03 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862763#comment-17862763
 ] 

Stamatis Zampetakis commented on HIVE-28339:


The new upgrade Jenkins image is now available at 
https://hub.docker.com/repository/docker/apache/hive-ci-jenkins/ and was 
created by the Dockerfile present in PR#5331.

Tomorrow July 4, 2024 starting at 9:00 UTC I will start the upgrade process. 
Basically, the main thing that needs to be done is update the existing Jenkins 
Kubernetes deployment (deployment.apps/jenkins)  and change the image to use 
"apache/hive-ci-jenkins:lts-jdk21" instead of "kgyrtkirk/htk-jenkins".

> Upgrade Jenkins version in CI from 2.332.3 to 2.452.2
> -
>
> Key: HIVE-28339
> URL: https://issues.apache.org/jira/browse/HIVE-28339
> Project: Hive
>  Issue Type: Task
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>
> The Jenkins version that is used in [https://ci.hive.apache.org/] is 
> currently at [2.332.3|https://www.jenkins.io/changelog-stable/#v2.332.3] 
> which was released in 2022.
> The latest stable version at the moment is 
> [2.452.2|https://www.jenkins.io/changelog-stable/#v2.452.2] and contains many 
> improvements, bug and CVE fixes.
> The Dockerfile that is used to build the Jenkins file can be found here:
> [https://github.com/kgyrtkirk/hive-test-kube/blob/master/htk-jenkins/Dockerfile]
> The Kubernetes deployment files can be found here:
> [https://github.com/kgyrtkirk/hive-test-kube/tree/master/k8s]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HIVE-28359) Discard old builds in Jenkins to avoid disk space exhaustion

2024-07-03 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-28359 started by Stamatis Zampetakis.
--
> Discard old builds in Jenkins to avoid disk space exhaustion
> 
>
> Key: HIVE-28359
> URL: https://issues.apache.org/jira/browse/HIVE-28359
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
> Attachments: builds.txt
>
>
> Currently Jenkins retains the builds from all active branches/PRs. 
> {code:bash}
> for b in `find var/jenkins_home/jobs -name "builds"`; do echo -n $b" " ; ls 
> -l $b | wc -l; done | sort -k2 -rn > builds.txt
> {code}
> Some PRs (e.g., 
> [PR-5216|https://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-5216/])
>  with an excessive number of builds (i.e., 66) can easily consume many GBs of 
> data (PR-5216 uses 13GB for the builds). The first build for PR-5216 was 
> saved on April 26, 2024 and it is now more than 2 months old.
> For master, we currently have all builds since January 2023 (previous builds 
> where manually removed as part of HIVE-28013). The builds for master occupy 
> currently 50GB of space.
> Due to the above the disk space (persistent volume) cannot be reclaimed and 
> currently it is almost full (91% /var/jenkins_home).
> {noformat}
> kubectl exec jenkins-6858ddb664-l4xfg -- bash -c "df"
> Filesystem 1K-blocks  Used Available Use% Mounted on
> overlay 98831908   4675004  94140520   5% /
> tmpfs  65536 0 65536   0% /dev
> tmpfs6645236 0   6645236   0% /sys/fs/cgroup
> /dev/sdb   308521792 278996208  29509200  91% /var/jenkins_home
> /dev/sda1   98831908   4675004  94140520   5% /etc/hosts
> shm65536 0 65536   0% /dev/shm
> tmpfs   1080112812  10801116   1% 
> /run/secrets/kubernetes.io/serviceaccount
> tmpfs6645236 0   6645236   0% /proc/acpi
> tmpfs6645236 0   6645236   0% /proc/scsi
> tmpfs6645236 0   6645236   0% /sys/firmware
> {noformat}
> Without a discard policy in place we are going to hit again HIVE-28013 or 
> other disk related issues pretty soon.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28359) Discard old builds in Jenkins to avoid disk space exhaustion

2024-07-03 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-28359:
--

 Summary: Discard old builds in Jenkins to avoid disk space 
exhaustion
 Key: HIVE-28359
 URL: https://issues.apache.org/jira/browse/HIVE-28359
 Project: Hive
  Issue Type: Task
  Components: Testing Infrastructure
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis
 Attachments: builds.txt

Currently Jenkins retains the builds from all active branches/PRs. 

{code:bash}
for b in `find var/jenkins_home/jobs -name "builds"`; do echo -n $b" " ; ls -l 
$b | wc -l; done | sort -k2 -rn > builds.txt
{code}

Some PRs (e.g., 
[PR-5216|https://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-5216/])
 with an excessive number of builds (i.e., 66) can easily consume many GBs of 
data (PR-5216 uses 13GB for the builds). The first build for PR-5216 was saved 
on April 26, 2024 and it is now more than 2 months old.

For master, we currently have all builds since January 2023 (previous builds 
where manually removed as part of HIVE-28013). The builds for master occupy 
currently 50GB of space.

Due to the above the disk space (persistent volume) cannot be reclaimed and 
currently it is almost full (91% /var/jenkins_home).

{noformat}
kubectl exec jenkins-6858ddb664-l4xfg -- bash -c "df"
Filesystem 1K-blocks  Used Available Use% Mounted on
overlay 98831908   4675004  94140520   5% /
tmpfs  65536 0 65536   0% /dev
tmpfs6645236 0   6645236   0% /sys/fs/cgroup
/dev/sdb   308521792 278996208  29509200  91% /var/jenkins_home
/dev/sda1   98831908   4675004  94140520   5% /etc/hosts
shm65536 0 65536   0% /dev/shm
tmpfs   1080112812  10801116   1% 
/run/secrets/kubernetes.io/serviceaccount
tmpfs6645236 0   6645236   0% /proc/acpi
tmpfs6645236 0   6645236   0% /proc/scsi
tmpfs6645236 0   6645236   0% /sys/firmware
{noformat}

Without a discard policy in place we are going to hit again HIVE-28013 or other 
disk related issues pretty soon.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28339) Upgrade Jenkins version in CI from 2.332.3 to 2.452.2

2024-07-01 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17861089#comment-17861089
 ] 

Stamatis Zampetakis commented on HIVE-28339:


I tested starting Jenkins from a vanilla Jenkins image 
(jenkins/jenkins:lts-jdk17) using the jenkins_home_backup.tar obtained 
previously.

{noformat}
tar -xvf jenkins_home_backup.tar
docker run -p 35000:8080 -v 
/home/stamatis/Issues/HIVE-28339/var/jenkins_home:/var/jenkins_home 
jenkins/jenkins:lts-jdk17 
{noformat}

Unfortunately Jenkins cannot start and there are many SEVERE errors due to the 
plugins and configuration that is present in the jenkins_home directory.
{noformat}> docker logs CONTAINER_NAME 2>&1 | grep SEVERE
2024-07-01 08:04:00.498+ [id=32]SEVERE  
jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading plugin Mina SSHD API 
:: Core v2.12.1-101.v85b_e08b_780dd (mina-sshd-api-core)
2024-07-01 08:04:00.499+ [id=32]SEVERE  
jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading plugin SSH server 
v3.322.v159e91f6a_550 (sshd)
2024-07-01 08:04:00.537+ [id=55]SEVERE  
jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading plugin Jenkins GIT 
server Plugin v1.11 (git-server)
2024-07-01 08:04:00.538+ [id=55]SEVERE  
jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading plugin Pipeline: 
Deprecated Groovy Libraries v588.v576c103a_ff86 (workflow-cps-global-lib)
2024-07-01 08:04:00.539+ [id=55]SEVERE  
jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading plugin Pipeline: 
Declarative v2.2064.v5eef7d0982b_e (pipeline-model-definition)
2024-07-01 08:04:00.540+ [id=55]SEVERE  
jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading plugin Pipeline 
implementation for Blue Ocean v1.25.5 (blueocean-pipeline-api-impl)
2024-07-01 08:04:00.541+ [id=55]SEVERE  
jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading plugin Bitbucket 
Pipeline for Blue Ocean v1.25.5 (blueocean-bitbucket-pipeline)
2024-07-01 08:04:00.542+ [id=38]SEVERE  
jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading plugin Events API for 
Blue Ocean v1.25.5 (blueocean-events)
2024-07-01 08:04:00.543+ [id=38]SEVERE  
jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading plugin Git Pipeline 
for Blue Ocean v1.25.5 (blueocean-git-pipeline)
2024-07-01 08:04:00.543+ [id=38]SEVERE  
jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading plugin GitHub Pipeline 
for Blue Ocean v1.25.5 (blueocean-github-pipeline)
2024-07-01 08:04:00.545+ [id=33]SEVERE  
jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading plugin Blue Ocean 
Pipeline Editor v1.25.5 (blueocean-pipeline-editor)
2024-07-01 08:04:00.546+ [id=33]SEVERE  
jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading plugin Blue Ocean 
v1.25.5 (blueocean)
2024-07-01 08:04:00.603+ [id=45]SEVERE  
jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading plugin Docker Pipeline 
v1.28 (docker-workflow)
2024-07-01 08:04:00.704+ [id=45]SEVERE  
jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading plugin Matrix 
Authorization Strategy Plugin v3.2.2 (matrix-auth)
2024-07-01 08:04:04.340+ [id=46]SEVERE  
jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading global config
2024-07-01 08:04:04.342+ [id=26]SEVERE  
hudson.util.BootFailure#publish: Failed to initialize Jenkins
{noformat}

So I replying to [my own 
previous|https://issues.apache.org/jira/browse/HIVE-28339?focusedCommentId=17860168&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17860168]
 it is not possible to use a vanilla Jenkins image and we need to publish and 
maintain our custom Jenkins images with all the necessary plugins installed.

I managed to make Jenkins start by modifying slightly the respective 
[Dockerfile|https://github.com/kgyrtkirk/hive-test-kube/blob/master/htk-jenkins/Dockerfile]
 that we use currently in CI. I will raise an INFRA ticket to request  the 
https://hub.docker.com/r/apache/hive-ci-jenkins/ to be created so we can 
publish the image there. I will also create a PR for apache/hive with the 
Dockerfile so that we have everything in the official Apache namespace.

> Upgrade Jenkins version in CI from 2.332.3 to 2.452.2
> -
>
> Key: HIVE-28339
> URL: https://issues.apache.org/jira/browse/HIVE-28339
> Project: Hive
>  Issue Type: Task
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> The Jenkins version that is used in [https://ci.hive.apache.org/] is 
> currently at [2.332.3|https://www.jenkins.io/changelog-stable/#v2.332.3] 
> which was released in 2022.
> The latest stable version at the moment is 
> [2.452.2|https://www.jenkins.io/changelog-stable/#v2.452.2] and contains many 
> improveme

[jira] [Resolved] (HIVE-28340) Test concurrent JDBC connections with Kerberized cluster, impersonation, and HTTP transport

2024-07-01 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-28340.

Fix Version/s: 4.1.0
   Resolution: Fixed

Fixed in 
https://github.com/apache/hive/commit/fe2e17c3ad4773a4b1066ac525f7de2a86572eca

Thanks for the review [~dengzh]!

> Test concurrent JDBC connections with Kerberized cluster, impersonation, and 
> HTTP transport
> ---
>
> Key: HIVE-28340
> URL: https://issues.apache.org/jira/browse/HIVE-28340
> Project: Hive
>  Issue Type: Test
>  Components: HiveServer2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> The new test case simulates a scenario with two JDBC clients doing the 
> following in parallel:
>  * client 1, continuously opens and closes connections (short-lived 
> connection)
>  * client 2, opens connection, sends fixed number of simple queries, closes 
> connection (long-lived connection)
> Since the clients are running in parallel we have one long-lived session in 
> HS2 interleaved with many short ones. 
> The test case aims to increase test coverage and guard against regressions in 
> the presence of many interleaved HS2 sessions.
> In older versions, without HIVE-27201, this test fails (with the exception 
> outlined below) when the cluster is Kerberized, and we are using HTTP 
> transport mode with impersonation enabled.
> {noformat}
> javax.security.sasl.SaslException: GSS initiate failed
>             at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_261]
>             at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:96)
>  ~[libthrift-0.16.0.jar:0.16.0]
>             at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:238) 
> ~[libthrift-0.16.0.jar:0.16.0]
>             at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:39)
>  ~[libthrift-0.16.0.jar:0.16.0]{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28310) Disable hive.optimize.join.disjunctive.transitive.predicates.pushdown by default

2024-07-01 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-28310.

Fix Version/s: 4.1.0
   Resolution: Fixed

Fixed in 
https://github.com/apache/hive/commit/a875a455867979758e24e51f97481f62ad80bc07

Thanks for the reviews [~asolimando] and [~kkasa]!

> Disable hive.optimize.join.disjunctive.transitive.predicates.pushdown by 
> default
> 
>
> Key: HIVE-28310
> URL: https://issues.apache.org/jira/browse/HIVE-28310
> Project: Hive
>  Issue Type: Task
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> HIVE-25758 introduced 
> hive.optimize.join.disjunctive.transitive.predicates.pushdown  to 
> conditionally limit some features of the HiveJoinPushTransitivePredicatesRule 
> which are rather unsafe and can lead to Hiveserver2 crashes (OOM, hangs, 
> etc.). 
> The property was initially set to true to retain the old behavior and prevent 
> changes in performance for those queries that work fine as is. However, when 
> the property is true there are various known cases/queries that can bring 
> down HS2 completely. When this happens debugging, finding the root cause, and 
> turning off the property may require lots of effort from developers and users.
> In this ticket, we propose to disable the property by default and thus limit 
> the optimizations performed by the rule (at least till a complete solution is 
> found for the known problematic cases).
> This change favors HS2 stability at the expense of slight performance 
> degradation in certain queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28339) Upgrade Jenkins version in CI from 2.332.3 to 2.452.2

2024-06-28 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860827#comment-17860827
 ] 

Stamatis Zampetakis commented on HIVE-28339:


The bulk of stateful content that is maintained by Jenkins is located under the 
"/var/jenkins_home" directory.
 
{noformat}
kubectl exec jenkins-6858ddb664-sg6nl df
Filesystem 1K-blocks  Used Available Use% Mounted on
overlay 98831908   4612016  94203508   5% /
tmpfs  65536 0 65536   0% /dev
tmpfs6645236 0   6645236   0% /sys/fs/cgroup
/dev/sdb   308521792 279898320  28607088  91% /var/jenkins_home
/dev/sda1   98831908   4612016  94203508   5% /etc/hosts
shm65536 0 65536   0% /dev/shm
tmpfs   1080112812  10801116   1% 
/run/secrets/kubernetes.io/serviceaccount
tmpfs6645236 0   6645236   0% /proc/acpi
tmpfs6645236 0   6645236   0% /proc/scsi
tmpfs6645236 0   6645236   0% /sys/firmware
{noformat}

As expected the persistent volume used by the Jenkins pod is mounted to the 
"/var/jenkins_home" directory (see kubectl describe 
pod/jenkins-6858ddb664-sg6nl).

For testing purposes we need to obtain a backup of the jenkins_home directory 
and try to mount it to the new (upgraded) Jenkins image to ensure that 
everything will work smoothly.

Currently, the jenkins_home directory is 280GB which makes a complete local 
backup and testing impractical. The majority of disk space is occupied by the 
"jobs" directory and in particular by archives that are kept for each build, 
test results, and log files for each run. These files are kept for archiving 
and diagnosability purposes when users wants to consult the results of each 
build. However, they are not indispensable for the correct functionality of the 
Jenkins instance so for the sake of our experiments we can exclude them from 
the backup. The command that was used to create the backup is given below.

{code:bash}
kubectl exec jenkins-6858ddb664-sg6nl -- tar cf - --exclude=junitResult.xml 
--exclude=*log* --exclude=archive --exclude=workflow --exclude=*git/objects* 
/var/jenkins_home > jenkins_home_backup.tar
{code}
The command took ~5 minutes to run and created an archive of 1.2GB. The 
exclusions are  referring to voluminous files that are nonessential for testing 
the upgrade.

I am now in the process of testing the new Jenkins image locally by mounting 
the unpacked jenkins_home_backup.tar directory to the /var/jenkins_home 
directory of the container.

> Upgrade Jenkins version in CI from 2.332.3 to 2.452.2
> -
>
> Key: HIVE-28339
> URL: https://issues.apache.org/jira/browse/HIVE-28339
> Project: Hive
>  Issue Type: Task
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> The Jenkins version that is used in [https://ci.hive.apache.org/] is 
> currently at [2.332.3|https://www.jenkins.io/changelog-stable/#v2.332.3] 
> which was released in 2022.
> The latest stable version at the moment is 
> [2.452.2|https://www.jenkins.io/changelog-stable/#v2.452.2] and contains many 
> improvements, bug and CVE fixes.
> The Dockerfile that is used to build the Jenkins file can be found here:
> [https://github.com/kgyrtkirk/hive-test-kube/blob/master/htk-jenkins/Dockerfile]
> The Kubernetes deployment files can be found here:
> [https://github.com/kgyrtkirk/hive-test-kube/tree/master/k8s]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-28339) Upgrade Jenkins version in CI from 2.332.3 to 2.452.2

2024-06-27 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-28339:
--

Assignee: Stamatis Zampetakis

> Upgrade Jenkins version in CI from 2.332.3 to 2.452.2
> -
>
> Key: HIVE-28339
> URL: https://issues.apache.org/jira/browse/HIVE-28339
> Project: Hive
>  Issue Type: Task
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> The Jenkins version that is used in [https://ci.hive.apache.org/] is 
> currently at [2.332.3|https://www.jenkins.io/changelog-stable/#v2.332.3] 
> which was released in 2022.
> The latest stable version at the moment is 
> [2.452.2|https://www.jenkins.io/changelog-stable/#v2.452.2] and contains many 
> improvements, bug and CVE fixes.
> The Dockerfile that is used to build the Jenkins file can be found here:
> [https://github.com/kgyrtkirk/hive-test-kube/blob/master/htk-jenkins/Dockerfile]
> The Kubernetes deployment files can be found here:
> [https://github.com/kgyrtkirk/hive-test-kube/tree/master/k8s]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HIVE-28339) Upgrade Jenkins version in CI from 2.332.3 to 2.452.2

2024-06-27 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-28339 started by Stamatis Zampetakis.
--
> Upgrade Jenkins version in CI from 2.332.3 to 2.452.2
> -
>
> Key: HIVE-28339
> URL: https://issues.apache.org/jira/browse/HIVE-28339
> Project: Hive
>  Issue Type: Task
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> The Jenkins version that is used in [https://ci.hive.apache.org/] is 
> currently at [2.332.3|https://www.jenkins.io/changelog-stable/#v2.332.3] 
> which was released in 2022.
> The latest stable version at the moment is 
> [2.452.2|https://www.jenkins.io/changelog-stable/#v2.452.2] and contains many 
> improvements, bug and CVE fixes.
> The Dockerfile that is used to build the Jenkins file can be found here:
> [https://github.com/kgyrtkirk/hive-test-kube/blob/master/htk-jenkins/Dockerfile]
> The Kubernetes deployment files can be found here:
> [https://github.com/kgyrtkirk/hive-test-kube/tree/master/k8s]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28339) Upgrade Jenkins version in CI from 2.332.3 to 2.452.2

2024-06-26 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860168#comment-17860168
 ] 

Stamatis Zampetakis commented on HIVE-28339:


Given that in the CI we are using persistent volumes where all Jenkins 
configurations and plugins remain as is I am trying to see if we really need to 
have and maintain a custom Jenkins image. Any thoughts [~abstractdog]? 

> Upgrade Jenkins version in CI from 2.332.3 to 2.452.2
> -
>
> Key: HIVE-28339
> URL: https://issues.apache.org/jira/browse/HIVE-28339
> Project: Hive
>  Issue Type: Task
>Reporter: Stamatis Zampetakis
>Priority: Major
>
> The Jenkins version that is used in [https://ci.hive.apache.org/] is 
> currently at [2.332.3|https://www.jenkins.io/changelog-stable/#v2.332.3] 
> which was released in 2022.
> The latest stable version at the moment is 
> [2.452.2|https://www.jenkins.io/changelog-stable/#v2.452.2] and contains many 
> improvements, bug and CVE fixes.
> The Dockerfile that is used to build the Jenkins file can be found here:
> [https://github.com/kgyrtkirk/hive-test-kube/blob/master/htk-jenkins/Dockerfile]
> The Kubernetes deployment files can be found here:
> [https://github.com/kgyrtkirk/hive-test-kube/tree/master/k8s]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28345) Avoid redundant HiveConf creation in MiniHS2.Builder

2024-06-26 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-28345.

Fix Version/s: 4.1.0
   Resolution: Fixed

Fixed in 
https://github.com/apache/hive/commit/633af371edf0823967da2dca50c3893855dab626

Thanks for the reviews [~okumin] [~simhadri-g]!

> Avoid redundant HiveConf creation in MiniHS2.Builder
> 
>
> Key: HIVE-28345
> URL: https://issues.apache.org/jira/browse/HIVE-28345
> Project: Hive
>  Issue Type: Improvement
>  Components: Tests
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> Every creation of a MiniHS2.Builder object triggers the creation  of a 
> [HiveConf 
> object|https://github.com/apache/hive/blob/1c9969a003b09abc851ae7e19631ad208d3b6066/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java#L100].
>  In many cases this new configuration object is thrown away and replaced by 
> another conf object via the [withConf 
> method|https://github.com/apache/hive/blob/1c9969a003b09abc851ae7e19631ad208d3b6066/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java#L159].
> Creating a HiveConf object is computationally heavy so for performance 
> reasons its best to avoid it if possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28339) Upgrade Jenkins version in CI from 2.332.3 to 2.452.2

2024-06-25 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859862#comment-17859862
 ] 

Stamatis Zampetakis commented on HIVE-28339:


Here is a rough outline of the steps that I have in mind.

# Build a new Jenkins image using the aforementioned Dockerfile
 # Push the image to some container registry:
 ## [https://hub.docker.com/r/kgyrtkirk/htk-jenkins]
 ## [https://hub.docker.com/r/apache/hive-ci-jenkins/] (Need to request a new 
Docker namespace from INFRA)
 ## [https://hub.docker.com/r/zabetak/hive-ci-jenkins/]
 # Take backups from existing Jenkins instance
 # Test/Start new Jenkins image locally (and ensure backups are working if 
necessary)
 # Send an email to dev@ about estimated downtime
 # Modify the kubernetes deployment file to point to new image (if necessary)
 # Restart the Jenkins pod (with or without backups) and hope for the best
 # Send an email when CI is operational

Please add/remove/suggest others as you see fit.

> Upgrade Jenkins version in CI from 2.332.3 to 2.452.2
> -
>
> Key: HIVE-28339
> URL: https://issues.apache.org/jira/browse/HIVE-28339
> Project: Hive
>  Issue Type: Task
>Reporter: Stamatis Zampetakis
>Priority: Major
>
> The Jenkins version that is used in [https://ci.hive.apache.org/] is 
> currently at [2.332.3|https://www.jenkins.io/changelog-stable/#v2.332.3] 
> which was released in 2022.
> The latest stable version at the moment is 
> [2.452.2|https://www.jenkins.io/changelog-stable/#v2.452.2] and contains many 
> improvements, bug and CVE fixes.
> The Dockerfile that is used to build the Jenkins file can be found here:
> [https://github.com/kgyrtkirk/hive-test-kube/blob/master/htk-jenkins/Dockerfile]
> The Kubernetes deployment files can be found here:
> [https://github.com/kgyrtkirk/hive-test-kube/tree/master/k8s]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28345) Avoid redundant HiveConf creation in MiniHS2.Builder

2024-06-24 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-28345:
--

 Summary: Avoid redundant HiveConf creation in MiniHS2.Builder
 Key: HIVE-28345
 URL: https://issues.apache.org/jira/browse/HIVE-28345
 Project: Hive
  Issue Type: Improvement
  Components: Tests
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Every creation of a MiniHS2.Builder object triggers the creation  of a 
[HiveConf 
object|https://github.com/apache/hive/blob/1c9969a003b09abc851ae7e19631ad208d3b6066/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java#L100].
 In many cases this new configuration object is thrown away and replaced by 
another conf object via the [withConf 
method|https://github.com/apache/hive/blob/1c9969a003b09abc851ae7e19631ad208d3b6066/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java#L159].

Creating a HiveConf object is computationally heavy so for performance reasons 
its best to avoid it if possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28311) Backward compatibility of java.sql.Date and java.sql.Timestamp in hive-serde

2024-06-21 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856729#comment-17856729
 ] 

Stamatis Zampetakis commented on HIVE-28311:


[~wechar] Can you please add more details around the actual use case/setup that 
leads to this ClassCastException. It would be nice if we can ensure that 
changes are backward compatible but we should be mindful not generate 
correctness problems in doing so.

> Backward compatibility of java.sql.Date and java.sql.Timestamp in hive-serde
> 
>
> Key: HIVE-28311
> URL: https://issues.apache.org/jira/browse/HIVE-28311
> Project: Hive
>  Issue Type: Bug
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>  Labels: pull-request-available
>
> HIVE-20007 introduced {{org.apache.hadoop.hive.common.type.Date}} and 
> {{org.apache.hadoop.hive.common.type.Timestamp}} to replace {{java.sql.Date}} 
> and {{{}java.sql.Timestamp{}}}.
> It's a huge improvements but it also produce incompatibility issues for 
> clients without this update.
> {code:bash}
> Caused by: java.lang.ClassCastException: java.sql.Timestamp cannot be cast to 
> org.apache.hadoop.hive.common.type.Timestamp
> at 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaTimestampObjectInspector.getPrimitiveWritableObject(JavaTimestampObjectInspector.java:33)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getTimestamp(PrimitiveObjectInspectorUtils.java:1232)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TimestampConverter.convert(PrimitiveObjectInspectorConverter.java:291)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getConstantObjectInspector(ObjectInspectorUtils.java:1397)
> at 
> org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc.getWritableObjectInspector(ExprNodeConstantDesc.java:93)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeConstantEvaluator.(ExprNodeConstantEvaluator.java:41)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorFactory.get(ExprNodeEvaluatorFactory.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.(ExprNodeGenericFuncEvaluator.java:101)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorFactory.get(ExprNodeEvaluatorFactory.java:58)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorFactory.get(ExprNodeEvaluatorFactory.java:43)
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartExprEvalUtils.prepareExpr(PartExprEvalUtils.java:118)
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prunePartitionNames(PartitionPruner.java:551)
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:73)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionNamesPrunedByExprNoTxn(ObjectStore.java:3606)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.access$1000(ObjectStore.java:241)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$16.getJdoResult(ObjectStore.java:4157)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$16.getJdoResult(ObjectStore.java:4124)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:3913)
> ... 30 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28337) TestMetaStoreUtils fails for invalid timestamps

2024-06-21 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856721#comment-17856721
 ] 

Stamatis Zampetakis commented on HIVE-28337:


The description of the ticket implies that there is a bug in the testing 
methodology but I would argue that the bug is in the production code instead.

The test aims to ensure that the conversion from/to string does not alter the 
value in any way no matter the timezone. The DATE, and TIMESTAMP datatypes in 
standard SQL are timezone agnostic and the [same semantics are adopted in 
Hive|https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types].
 There are still various known issues in Hive for dates/timestamps that fall 
into DST shift and the MetaStoreUtils API is probably affected.

 

 

> TestMetaStoreUtils fails for invalid timestamps
> ---
>
> Key: HIVE-28337
> URL: https://issues.apache.org/jira/browse/HIVE-28337
> Project: Hive
>  Issue Type: Bug
>Reporter: Kiran Velumuri
>Assignee: Kiran Velumuri
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-06-18-12-42-05-646.png, 
> image-2024-06-18-12-42-31-472.png
>
>
> The test 
> org.apache.hadoop.hive.metastore.utils.TestMetaStoreUtils#testTimestampToString
>  and #testDateToString fails for invalid timestamps in the following cases:
> 1. Timestamps in time-zones which observe daylight savings during which the 
> clock is set forward(typicallly 2:00 AM - 3:00 AM)
> Example: 2417-03-26T02:08:43 in Europe/Paris is invalid, and would get 
> converted to 2417-03-26T03:08:43 by Timestamp.valueOf() method
> This is happening due to representing timestamp as LocalDateTime in 
> TestMetaStoreUtils, which is independent of the time-zone of the timestamp. 
> This LocalDateTime timestamp when combined with time-zone is leading to 
> invalid timestamp.
>  
> 2. Timestamps with year as ''
> Example: -01-07T22:44:36 is invalid and would get converted to 
> 0001-01-07T22:44:36 by Timestamp.valueof() method
> Year '' is invalid and should not be included while generating the test 
> cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28340) Test concurrent JDBC connections with Kerberized cluster, impersonation, and HTTP transport

2024-06-20 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-28340:
--

 Summary: Test concurrent JDBC connections with Kerberized cluster, 
impersonation, and HTTP transport
 Key: HIVE-28340
 URL: https://issues.apache.org/jira/browse/HIVE-28340
 Project: Hive
  Issue Type: Test
  Components: HiveServer2
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


The new test case simulates a scenario with two JDBC clients doing the 
following in parallel:
 * client 1, continuously opens and closes connections (short-lived connection)
 * client 2, opens connection, sends fixed number of simple queries, closes 
connection (long-lived connection)

Since the clients are running in parallel we have one long-lived session in HS2 
interleaved with many short ones. 

The test case aims to increase test coverage and guard against regressions in 
the presence of many interleaved HS2 sessions.

In older versions, without HIVE-27201, this test fails (with the exception 
outlined below) when the cluster is Kerberized, and we are using HTTP transport 
mode with impersonation enabled.
{noformat}
javax.security.sasl.SaslException: GSS initiate failed
            at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
 ~[?:1.8.0_261]
            at 
org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:96)
 ~[libthrift-0.16.0.jar:0.16.0]
            at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:238) 
~[libthrift-0.16.0.jar:0.16.0]
            at 
org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:39)
 ~[libthrift-0.16.0.jar:0.16.0]{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   6   7   8   9   10   >