[jira] [Resolved] (HIVE-27727) Materialized view query rewrite fails if query has decimal derived aggregate

2023-09-26 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-27727.
---
Resolution: Fixed

Merged to master. Thanks [~zabetak] for review.

> Materialized view query rewrite fails if query has decimal derived aggregate
> 
>
> Key: HIVE-27727
> URL: https://issues.apache.org/jira/browse/HIVE-27727
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: cbo, materializedviews, pull-request-available
>
> {code}
> create table t1 (a int, b decimal(3,2)) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> create materialized view mv1 as
> select a, sum(b), count(b) from t1 group by a;
> explain cbo
> select a, avg(b) from t1 group by a;
> {code}
> MV is not used
> {code}
> CBO PLAN:
> HiveProject(a=[$0], _o__c1=[CAST(/($1, $2)):DECIMAL(7, 6)])
>   HiveAggregate(group=[{0}], agg#0=[sum($1)], agg#1=[count($1)])
> HiveTableScan(table=[[default, t1]], table:alias=[t1])
> {code}
> If {{avg}} input is not decimal but for example {{int}} the query plan is 
> rewritten to use the MV



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27743) Semantic Search In Hive

2023-09-26 Thread Sreenath (Jira)
Sreenath created HIVE-27743:
---

 Summary: Semantic Search In Hive
 Key: HIVE-27743
 URL: https://issues.apache.org/jira/browse/HIVE-27743
 Project: Hive
  Issue Type: Wish
 Environment: *  
Reporter: Sreenath


Semantic search is a way for computers to understand the meaning behind words 
and phrases when you're searching for something. Instead of just looking for 
exact matches of keywords, it tries to figure out what you're really asking and 
provides results that are more relevant and meaningful to your question. It's 
like having a search engine that can understand what you mean, not just what 
you say, making it easier to find the information you're looking for. This 
ticket is a wish to have Semantic search in Hive.

On the implementation side, semantic search uses an embedding model and any of 
the similarity distance functions. 

My proposal is to implement functions for on-the-fly calculation of similarity 
distance between two values. Once we have them we could easily do semantic 
search as part of a where clause.
 * Eg (using a cosine similarity function): “WHERE cos_dist(region, 'europe') > 
0.9“. And it could return records with regions like Scandinavia, Nordic, Baltic 
etc…
 * We could have functions thats accept values as text or as vector embeddings.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27739) Multiple issues with timestamps with timezone - can lead to data inconsistency

2023-09-26 Thread Janos Kovacs (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janos Kovacs updated HIVE-27739:

Summary: Multiple issues with timestamps with timezone - can lead to data 
inconsistency  (was: Multiple issues with timestamps with timezone)

> Multiple issues with timestamps with timezone - can lead to data inconsistency
> --
>
> Key: HIVE-27739
> URL: https://issues.apache.org/jira/browse/HIVE-27739
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0-beta-1
>Reporter: Janos Kovacs
>Assignee: Zoltán Rátkai
>Priority: Major
>
> The following issues were found testing timestamps with timezonse:
>  * CREATE TABLE fails with SemanticException when hive.local.time.zone is set 
> to different valid value in the session
>  * Invalid timezone values (e.g. with typo) treated as UTC which can lead to 
> data consistency / loss issues
>  * LOCAL is an invalid timezone value and treated as UTC instead of treating 
> as system's timezone
> The issues are tracked as sub-tasks.
> in general, base tests are:
> {noformat}
> SELECT
>   '\${system:user.timezone}' as os,
>   '\${hiveconf:hive.local.time.zone}' as hive,
>   'TZ'  as branch,
>   tzas orig,
>   to_utc_timestamp(tz,'\${hiveconf:hive.local.time.zone}') as to_utc, 
>   
> from_utc_timestamp(to_utc_timestamp(tz,'\${hiveconf:hive.local.time.zone}'),'Europe/Budapest')
>   as to_bp,
>   
> from_utc_timestamp(to_utc_timestamp(tz,'\${hiveconf:hive.local.time.zone}'),'America/Los_Angeles')
>  as to_la
> FROM timestamptest;
> "
> {noformat}
>  
> The results are:
> {noformat}
> +--+--+-+++++
> |os|   hive   | branch  |  orig   
>| to_utc | to_bp  | to_la  
> |
> +--+--+-+++++
> | Europe/Budapest  | Europe/Budapest  | TZ  | 2016-01-03 21:26:34.0 
> Europe/Budapest  | 2016-01-03 20:26:34.0  | 2016-01-03 21:26:34.0  | 
> 2016-01-03 12:26:34.0  |
> +--+--+-+++++
> | Europe/Budapest  | UTC  | TZ  | 2016-01-03 20:26:34.0 UTC   
>| 2016-01-03 20:26:34.0  | 2016-01-03 21:26:34.0  | 2016-01-03 
> 12:26:34.0  |
> +--+--+-+++++
> | Europe/Budapest  | LOCAL| TZ  | 2016-01-03 21:26:34.0 
> Europe/Budapest  | 2016-01-03 21:26:34.0  | 2016-01-03 22:26:34.0  | 
> 2016-01-03 13:26:34.0  | !!!
> +--+--+-+++++
> | UTC  | Europe/Budapest  | TZ  | 2016-01-03 21:26:34.0 
> Europe/Budapest  | 2016-01-03 20:26:34.0  | 2016-01-03 21:26:34.0  | 
> 2016-01-03 12:26:34.0  |
> +--+--+-+++++
> | UTC  | UTC  | TZ  | 2016-01-03 20:26:34.0 UTC   
>| 2016-01-03 20:26:34.0  | 2016-01-03 21:26:34.0  | 2016-01-03 
> 12:26:34.0  |
> +--+--+-+++++
> | UTC  | LOCAL| TZ  | 2016-01-03 20:26:34.0 UTC   
>| 2016-01-03 20:26:34.0  | 2016-01-03 21:26:34.0  | 2016-01-03 
> 12:26:34.0  | !!!
> +--+--+-+++++
> {noformat}
> The problematic cases:
>  * the "Europe/Budapest | LOCAL" case is wrong, LOCAL is treated as UTC 
> instead of system's TZ which makes 1h offset when converted
>  * the "UTC | LOCAL" case is only good because LOCAL is treated as UTC all 
> the time
> Repro code and more details are in each of the subtask tickets



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27741) Invalid timezone value in to_utc_timestamp() is treated as UTC which can lead to data consistency issues

2023-09-26 Thread Janos Kovacs (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janos Kovacs updated HIVE-27741:

Description: 
When the timezone specified in the *to_utc_timestamp()* function is not valid, 
it still treated as UTC instead of throwing an error. If the user accidentally 
made a typo - e.g. America/Los{color:#ff}*t*{color}_Angeles, the query runs 
successfully returning an invalid converted value which can lead to data 
consistency issues. 

Repro code:
{noformat}
docker rm -f hive4

export HIVE_VERSION=4.0.0-beta-2-SNAPSHOT
export HS2_ENV_TZ="Europe/Budapest"
export HS2_USER_TZ=${HS2_ENV_TZ}
export HIVE_LOCAL_TZ="America/Los_Angeles"

export HS2_OPTS="-Duser.timezone=$HS2_USER_TZ 
-Dhive.local.time.zone=$HIVE_LOCAL_TZ"
export HS2_OPTS="$HS2_OPTS  
-Dhive.server2.tez.initialize.default.sessions=false"
docker run -d -p 1:1 -p 10001:10001 -p 10002:10002 --env 
TZ=${HS2_ENV_TZ} --env SERVICE_OPTS=${HS2_OPTS} --env SERVICE_NAME=hiveserver2 
--name hive4 apache/hive:${HIVE_VERSION}

docker exec -it hive4 beeline -u 'jdbc:hive2://localhost:1/' -e "
SELECT '\${env:TZ}' as \`env:TZ\`,
   '\${system:user.timezone}' as \`system:user.timezone\`,
   '\${hiveconf:hive.local.time.zone}' as \`hiveconf:hive.local.time.zone\`;

DROP TABLE IF EXISTS timestamptest;
CREATE TABLE timestamptest (
  ts timestamp,
  tz timestamp with local time zone
) STORED AS TEXTFILE;
INSERT INTO timestamptest select TIMESTAMP'2016-01-03 
12:26:34',TIMESTAMPLOCALTZ'2016-01-03 12:26:34 America/Los_Angeles';

SELECT
  tzas orig,
  to_utc_timestamp(tz, 'America/Los_Angeles')   as utc_correct_tz,
  to_utc_timestamp(tz, 'Europe/HereIsATypo')as utc_incorrect_tz,
  to_utc_timestamp(tz, 'LOCAL') as utc_local_aslo_incorrect_tz,
  to_utc_timestamp(tz, 'UTC')   as utc_tz
FROM timestamptest;
"
{noformat}
 
The results are:
{noformat}
+--+---++
|  env:tz  | system:user.timezone  | hiveconf:hive.local.time.zone  |
+--+---++
| Europe/Budapest  | Europe/Budapest   | America/Los_Angeles|
+--+---++

++++--++
|orig| utc_correct_tz |
utc_incorrect_tz| utc_local_aslo_incorrect_tz  | utc_tz |
++++--++
| 2016-01-03 12:26:34.0 America/Los_Angeles  | 2016-01-03 20:26:34.0  | 
2016-01-03 12:26:34.0  | 2016-01-03 12:26:34.0| 2016-01-03 12:26:34.0  |
++++--++
{noformat}
Note:
 * the invalid timezone - utc_incorrect_tz - is treated as UTC
 * also note that LOCAL is also treated as UTC which in fact should be treated 
as system's timezone, but as LOCAL is also an invalid timezone value in hive4, 
ut becomes UTC just like any other invalid and/or typo timezone values (see 
HIVE-27742)

 

Hive should throw an Exception in that case to let the user know that the 
provided timezone is wrong - at least this should be configurable, e.g. via 
something like {*}hive.strict.time.zone.check{*}.

  was:
When the timezone specified in the *to_utc_timestamp()* function is not valid, 
it still treated as UTC instead of throwing an error. If the user accidentally 
made a typo - e.g. America/Los{color:#ff}*t*{color}_Angeles, the query runs 
successfully returning an invalid converted value which can lead to data 
consistency issues. 

Repro code:
{noformat}
docker rm -f hive4

export HIVE_VERSION=4.0.0-beta-2-SNAPSHOT
export HS2_ENV_TZ="Europe/Budapest"
export HS2_USER_TZ=${HS2_ENV_TZ}
export HIVE_LOCAL_TZ="America/Los_Angeles"

export HS2_OPTS="-Duser.timezone=$HS2_USER_TZ 
-Dhive.local.time.zone=$HIVE_LOCAL_TZ"
export HS2_OPTS="$HS2_OPTS  
-Dhive.server2.tez.initialize.default.sessions=false"
docker run -d -p 1:1 -p 10001:10001 -p 10002:10002 --env 
TZ=${HS2_ENV_TZ} --env SERVICE_OPTS=${HS2_OPTS} --env SERVICE_NAME=hiveserver2 
--name hive4 apache/hive:${HIVE_VERSION}

docker exec -it hive4 beeline -u 'jdbc:hive2://localhost:1/' -e "
SELECT '\${env:TZ}' as \`env:TZ\`,
   '\${system:user.timezone}' as \`system:user.timezone\`,
   '\${hiveconf:hive.local.time.zone}' as \`hiveconf:hive.local.time.zone\`;

DROP TABLE IF EXISTS timestamptest;
CREATE TABLE timestamptest (
  ts timestamp,
  tz timestamp with local time zone
) STORED AS 

[jira] [Created] (HIVE-27742) LOCAL timezone value is treated as UTC instead of system's timezone which causes data consistency issues

2023-09-26 Thread Janos Kovacs (Jira)
Janos Kovacs created HIVE-27742:
---

 Summary: LOCAL timezone value is treated as UTC instead of 
system's timezone which causes data consistency issues
 Key: HIVE-27742
 URL: https://issues.apache.org/jira/browse/HIVE-27742
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 4.0.0-beta-1
Reporter: Janos Kovacs
Assignee: Zoltán Rátkai


The Hive configuration states:
{noformat}
HIVE_LOCAL_TIME_ZONE("hive.local.time.zone", "LOCAL",
"Sets the time-zone for displaying and interpreting time stamps. If 
this property value is set to\n" +
"LOCAL, it is not specified, or it is not a correct time-zone, the 
system default time-zone will be\n " +
"used instead. Time-zone IDs can be specified as region-based zone IDs 
(based on IANA time-zone data),\n" +
"abbreviated zone IDs, or offset IDs."),
{noformat}

But seems like in hive4 (-beta) it always treated as UTC - as any other invalid 
timezone value (see HIVE-27741).

Repro code:
{noformat}
docker rm -f hive4

export HIVE_VERSION=4.0.0-beta-2-SNAPSHOT
export HS2_ENV_TZ="Europe/Budapest"
export HS2_USER_TZ=${HS2_ENV_TZ}
export HIVE_LOCAL_TZ=${HS2_ENV_TZ}

export HS2_OPTS="-Duser.timezone=$HS2_USER_TZ 
-Dhive.local.time.zone=$HIVE_LOCAL_TZ"
export HS2_OPTS="$HS2_OPTS  
-Dhive.server2.tez.initialize.default.sessions=false"
docker run -d -p 1:1 -p 10001:10001 -p 10002:10002 --env 
TZ=${HS2_ENV_TZ} --env SERVICE_OPTS=${HS2_OPTS} --env SERVICE_NAME=hiveserver2 
--name hive4 apache/hive:${HIVE_VERSION}

docker exec -it hive4 beeline -u 'jdbc:hive2://localhost:1/' -e "
SELECT '\${env:TZ}' as \`env:TZ\`,
   '\${system:user.timezone}' as \`system:user.timezone\`,
   '\${hiveconf:hive.local.time.zone}' as \`hiveconf:hive.local.time.zone\`;

DROP TABLE IF EXISTS timestamptest;
CREATE TABLE timestamptest (
  ts timestamp,
  tz timestamp with local time zone
) STORED AS TEXTFILE;
INSERT INTO timestamptest select TIMESTAMP'2016-01-03 
12:26:34',TIMESTAMPLOCALTZ'2016-01-03 12:26:34 America/Los_Angeles';

SET hive.query.results.cache.enabled=false;

SET hive.local.time.zone=LOCAL;
SELECT '\${env:TZ}' as \`env:TZ\`,
   '\${system:user.timezone}' as \`system:user.timezone\`,
   '\${hiveconf:hive.local.time.zone}' as \`hiveconf:hive.local.time.zone\`;
SELECT
  'LOCAL' as tzset,
  tz  as orig,
  to_utc_timestamp(tz, 'LOCAL')   as 
utc_local,
  to_utc_timestamp(tz, 'Europe/Budapest') as utc_tz,
  from_utc_timestamp(to_utc_timestamp(tz,'LOCAL'),'Europe/Budapest')  as to_bp
FROM timestamptest;

SET hive.local.time.zone=Europe/Budapest;
SELECT '\${env:TZ}' as \`env:TZ\`,
   '\${system:user.timezone}' as \`system:user.timezone\`,
   '\${hiveconf:hive.local.time.zone}' as \`hiveconf:hive.local.time.zone\`;
SELECT
  'Europe/Budapest' 
as tzset,
  tz
as orig,
  to_utc_timestamp(tz, 'LOCAL') 
as utc_local,
  to_utc_timestamp(tz, 'Europe/Budapest')   
as utc_tz,
  from_utc_timestamp(to_utc_timestamp(tz,'Europe/Budapest'),'Europe/Budapest')  
as to_bp
FROM timestamptest;
"
{noformat}

The results are:
{noformat}
+--+---++
|  env:tz  | system:user.timezone  | hiveconf:hive.local.time.zone  |
+--+---++
| Europe/Budapest  | Europe/Budapest   | LOCAL  |
+--+---++

++++++
| tzset  |  orig  |   utc_local|
 utc_tz | to_bp  |
++++++
| LOCAL  | 2016-01-03 21:26:34.0 Europe/Budapest  | 2016-01-03 21:26:34.0  | 
2016-01-03 20:26:34.0  | 2016-01-03 22:26:34.0  |
++++++


+--+---++
|  env:tz  | system:user.timezone  | hiveconf:hive.local.time.zone  |
+--+---++
| Europe/Budapest  | Europe/Budapest   | Europe/Budapest|

[jira] [Created] (HIVE-27741) Invalid timezone value in to_utc_timestamp() is treated as UTC which can lead to data consistency issues

2023-09-26 Thread Janos Kovacs (Jira)
Janos Kovacs created HIVE-27741:
---

 Summary: Invalid timezone value in to_utc_timestamp() is treated 
as UTC which can lead to data consistency issues
 Key: HIVE-27741
 URL: https://issues.apache.org/jira/browse/HIVE-27741
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 4.0.0-beta-1
Reporter: Janos Kovacs
Assignee: Zoltán Rátkai


When the timezone specified in the *to_utc_timestamp()* function is not valid, 
it still treated as UTC instead of throwing an error. If the user accidentally 
made a typo - e.g. America/Los{color:#ff}*t*{color}_Angeles, the query runs 
successfully returning an invalid converted value which can lead to data 
consistency issues. 

Repro code:
{noformat}
docker rm -f hive4

export HIVE_VERSION=4.0.0-beta-2-SNAPSHOT
export HS2_ENV_TZ="Europe/Budapest"
export HS2_USER_TZ=${HS2_ENV_TZ}
export HIVE_LOCAL_TZ="America/Los_Angeles"

export HS2_OPTS="-Duser.timezone=$HS2_USER_TZ 
-Dhive.local.time.zone=$HIVE_LOCAL_TZ"
export HS2_OPTS="$HS2_OPTS  
-Dhive.server2.tez.initialize.default.sessions=false"
docker run -d -p 1:1 -p 10001:10001 -p 10002:10002 --env 
TZ=${HS2_ENV_TZ} --env SERVICE_OPTS=${HS2_OPTS} --env SERVICE_NAME=hiveserver2 
--name hive4 apache/hive:${HIVE_VERSION}

docker exec -it hive4 beeline -u 'jdbc:hive2://localhost:1/' -e "
SELECT '\${env:TZ}' as \`env:TZ\`,
   '\${system:user.timezone}' as \`system:user.timezone\`,
   '\${hiveconf:hive.local.time.zone}' as \`hiveconf:hive.local.time.zone\`;

DROP TABLE IF EXISTS timestamptest;
CREATE TABLE timestamptest (
  ts timestamp,
  tz timestamp with local time zone
) STORED AS TEXTFILE;
INSERT INTO timestamptest select TIMESTAMP'2016-01-03 
12:26:34',TIMESTAMPLOCALTZ'2016-01-03 12:26:34 America/Los_Angeles';

SELECT
  tzas orig,
  to_utc_timestamp(tz, 'America/Los_Angeles')   as utc_correct_tz,
  to_utc_timestamp(tz, 'Europe/HereIsATypo')as utc_incorrect_tz,
  to_utc_timestamp(tz, 'LOCAL') as utc_local_aslo_incorrect_tz,
  to_utc_timestamp(tz, 'UTC')   as utc_tz
FROM timestamptest;
"
{noformat}
 
The results are:
{noformat}
+--+---++
|  env:tz  | system:user.timezone  | hiveconf:hive.local.time.zone  |
+--+---++
| Europe/Budapest  | Europe/Budapest   | America/Los_Angeles|
+--+---++

++++--++
|orig| utc_correct_tz |
utc_incorrect_tz| utc_local_aslo_incorrect_tz  | utc_tz |
++++--++
| 2016-01-03 12:26:34.0 America/Los_Angeles  | 2016-01-03 20:26:34.0  | 
2016-01-03 12:26:34.0  | 2016-01-03 12:26:34.0| 2016-01-03 12:26:34.0  |
++++--++
{noformat}
Note:
 * the invalid timezone - utc_incorrect_tz - is treated as UTC
 * also note that LOCAL is also treated as UTC which in fact should be treated 
as system's timezone, but as LOCAL is also an invalid timezone value in hive4, 
ut becomes UTC just like any other invalid and/or typo timezone values

 

Hive should throw an Exception in that case to let the user know that the 
provided timezone is wrong - at least this should be configurable, e.g. via 
something like {*}hive.strict.time.zone.check{*}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27740) CREATE TABLE with timestamp with timezone fails with SemanticException

2023-09-26 Thread Janos Kovacs (Jira)
Janos Kovacs created HIVE-27740:
---

 Summary: CREATE TABLE with timestamp with timezone fails with 
SemanticException
 Key: HIVE-27740
 URL: https://issues.apache.org/jira/browse/HIVE-27740
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 4.0.0-beta-1
Reporter: Janos Kovacs
Assignee: Zoltán Rátkai


CREATE TABLE with timestamp with timezone fails with SemanticException when 
timezone is changed in a session to another valid value.

Repro code:
{noformat}
docker rm -f hive4

export HIVE_VERSION=4.0.0-beta-2-SNAPSHOT
export HS2_ENV_TZ="Europe/Budapest"
export HS2_USER_TZ=${HS2_ENV_TZ}
export HIVE_LOCAL_TZ=${HS2_ENV_TZ}

export HS2_OPTS="-Duser.timezone=$HS2_USER_TZ 
-Dhive.local.time.zone=$HIVE_LOCAL_TZ"
export HS2_OPTS="$HS2_OPTS  
-Dhive.server2.tez.initialize.default.sessions=false"
docker run -d -p 1:1 -p 10001:10001 -p 10002:10002 --env 
TZ=${HS2_ENV_TZ} --env SERVICE_OPTS=${HS2_OPTS} --env SERVICE_NAME=hiveserver2 
--name hive4 apache/hive:${HIVE_VERSION}

docker exec -it hive4 beeline -u 'jdbc:hive2://localhost:1/' -e "
SELECT '\${env:TZ}' as \`env:TZ\`,
   '\${system:user.timezone}' as \`system:user.timezone\`,
   '\${hiveconf:hive.local.time.zone}' as \`hiveconf:hive.local.time.zone\`;

DROP TABLE IF EXISTS timestamptest;
CREATE TABLE timestamptest (
  ts timestamp,
  tz timestamp with local time zone DEFAULT TIMESTAMPLOCALTZ'2016-01-03 
12:26:34 America/Los_Angeles'
) STORED AS TEXTFILE;
INSERT INTO timestamptest (ts) VALUES (TIMESTAMP'2016-01-03 20:26:34');
SELECT ts, tz from timestamptest;

SET hive.local.time.zone=Europe/Berlin;

SELECT '\${env:TZ}' as \`env:TZ\`,
   '\${system:user.timezone}' as \`system:user.timezone\`,
   '\${hiveconf:hive.local.time.zone}' as \`hiveconf:hive.local.time.zone\`;

SELECT ts, tz from timestamptest;

DROP TABLE IF EXISTS timestamptest;
CREATE TABLE timestamptest (
  ts timestamp,
  tz timestamp with local time zone DEFAULT TIMESTAMPLOCALTZ'2016-01-03 
12:26:34 America/Los_Angeles'
) STORED AS TEXTFILE;
"
{noformat}
 
Querying the data works with both timezone values:
{noformat}
+--+---++
|  env:tz  | system:user.timezone  | hiveconf:hive.local.time.zone  |
+--+---++
| Europe/Budapest  | Europe/Budapest   | Europe/Budapest|
+--+---++

+++
|   ts   |   tz   |
+++
| 2016-01-03 20:26:34.0  | 2016-01-03 21:26:34.0 Europe/Budapest  |
+++

+--+---++
|  env:tz  | system:user.timezone  | hiveconf:hive.local.time.zone  |
+--+---++
| Europe/Budapest  | Europe/Budapest   | Europe/Berlin  |
+--+---++

++--+
|   ts   |  tz  |
++--+
| 2016-01-03 20:26:34.0  | 2016-01-03 21:26:34.0 Europe/Berlin  |
++--+
{noformat}

CREATE also work with the system set timezone value but fails when changed. The 
second CREATE TABLE statement in the above example fails with:
{noformat}
Error: Error while compiling statement: FAILED: SemanticException [Error 
10326]: Invalid Constraint syntax Invalid type: timestamp with local time zone 
for default value: TIMESTAMPLOCALTZ'2016-01-03 12:26:34 America/Los_Angeles'. 
Please make sure that the type is compatible with column type: timestamp with 
local time zone (state=42000,code=10326)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27739) Multiple issues with timestamps with timezone

2023-09-26 Thread Janos Kovacs (Jira)
Janos Kovacs created HIVE-27739:
---

 Summary: Multiple issues with timestamps with timezone
 Key: HIVE-27739
 URL: https://issues.apache.org/jira/browse/HIVE-27739
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0-beta-1
Reporter: Janos Kovacs
Assignee: Zoltán Rátkai


The following issues were found testing timestamps with timezonse:
 * CREATE TABLE fails with SemanticException when hive.local.time.zone is set 
to different valid value in the session
 * Invalid timezone values (e.g. with typo) treated as UTC which can lead to 
data consistency / loss issues
 * LOCAL is an invalid timezone value and treated as UTC instead of treating as 
system's timezone

The issues are tracked as sub-tasks.

in general, base tests are:
{noformat}
SELECT
  '\${system:user.timezone}' as os,
  '\${hiveconf:hive.local.time.zone}' as hive,
  'TZ'  as branch,
  tzas orig,
  to_utc_timestamp(tz,'\${hiveconf:hive.local.time.zone}') as to_utc, 
  
from_utc_timestamp(to_utc_timestamp(tz,'\${hiveconf:hive.local.time.zone}'),'Europe/Budapest')
  as to_bp,
  
from_utc_timestamp(to_utc_timestamp(tz,'\${hiveconf:hive.local.time.zone}'),'America/Los_Angeles')
 as to_la
FROM timestamptest;
"
{noformat}
 
The results are:
{noformat}
+--+--+-+++++
|os|   hive   | branch  |  orig 
 | to_utc | to_bp  | to_la  
|
+--+--+-+++++
| Europe/Budapest  | Europe/Budapest  | TZ  | 2016-01-03 21:26:34.0 
Europe/Budapest  | 2016-01-03 20:26:34.0  | 2016-01-03 21:26:34.0  | 2016-01-03 
12:26:34.0  |
+--+--+-+++++
| Europe/Budapest  | UTC  | TZ  | 2016-01-03 20:26:34.0 UTC 
 | 2016-01-03 20:26:34.0  | 2016-01-03 21:26:34.0  | 2016-01-03 
12:26:34.0  |
+--+--+-+++++
| Europe/Budapest  | LOCAL| TZ  | 2016-01-03 21:26:34.0 
Europe/Budapest  | 2016-01-03 21:26:34.0  | 2016-01-03 22:26:34.0  | 2016-01-03 
13:26:34.0  | !!!
+--+--+-+++++
| UTC  | Europe/Budapest  | TZ  | 2016-01-03 21:26:34.0 
Europe/Budapest  | 2016-01-03 20:26:34.0  | 2016-01-03 21:26:34.0  | 2016-01-03 
12:26:34.0  |
+--+--+-+++++
| UTC  | UTC  | TZ  | 2016-01-03 20:26:34.0 UTC 
 | 2016-01-03 20:26:34.0  | 2016-01-03 21:26:34.0  | 2016-01-03 
12:26:34.0  |
+--+--+-+++++
| UTC  | LOCAL| TZ  | 2016-01-03 20:26:34.0 UTC 
 | 2016-01-03 20:26:34.0  | 2016-01-03 21:26:34.0  | 2016-01-03 
12:26:34.0  | !!!
+--+--+-+++++
{noformat}
The problematic cases:
 * the "Europe/Budapest | LOCAL" case is wrong, LOCAL is treated as UTC instead 
of system's TZ which makes 1h offset when converted
 * the "UTC | LOCAL" case is only good because LOCAL is treated as UTC all the 
time

Repro code and more details are in each of the subtask tickets



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27738) Fix Schematool version so that it can pickup correct schema script file after 4.0.0-beta-1 release

2023-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27738:
--
Labels: pull-request-available  (was: )

> Fix Schematool version so that it can pickup correct schema script file after 
> 4.0.0-beta-1 release
> --
>
> Key: HIVE-27738
> URL: https://issues.apache.org/jira/browse/HIVE-27738
> Project: Hive
>  Issue Type: Bug
>Reporter: KIRTI RUGE
>Assignee: KIRTI RUGE
>Priority: Major
>  Labels: pull-request-available
>
> hive.version.shortname needs to be fixed from / pom.xml and 
> standalone-metastore/pom.xml so that it should pick up xxx4.0.0-beta-2.xx.sql 
> file correctly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27738) Fix Schematool version so that it can pickup correct schema script file after 4.0.0-beta-1 release

2023-09-26 Thread KIRTI RUGE (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KIRTI RUGE reassigned HIVE-27738:
-

Assignee: KIRTI RUGE

> Fix Schematool version so that it can pickup correct schema script file after 
> 4.0.0-beta-1 release
> --
>
> Key: HIVE-27738
> URL: https://issues.apache.org/jira/browse/HIVE-27738
> Project: Hive
>  Issue Type: Bug
>Reporter: KIRTI RUGE
>Assignee: KIRTI RUGE
>Priority: Major
>
> hive.version.shortname needs to be fixed from / pom.xml and 
> standalone-metastore/pom.xml so that it should pick up xxx4.0.0-beta-2.xx.sql 
> file correctly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27738) Fix Schematool version so that it can pickup correct schema script file after 4.0.0-beta-1 release

2023-09-26 Thread KIRTI RUGE (Jira)
KIRTI RUGE created HIVE-27738:
-

 Summary: Fix Schematool version so that it can pickup correct 
schema script file after 4.0.0-beta-1 release
 Key: HIVE-27738
 URL: https://issues.apache.org/jira/browse/HIVE-27738
 Project: Hive
  Issue Type: Bug
Reporter: KIRTI RUGE


hive.version.shortname needs to be fixed from / pom.xml and 
standalone-metastore/pom.xml so that it should pick up xxx4.0.0-beta-2.xx.sql 
file correctly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27733) Intermittent ConcurrentModificationException in HiveServer2

2023-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27733:
--
Labels: pull-request-available  (was: )

> Intermittent ConcurrentModificationException in HiveServer2
> ---
>
> Key: HIVE-27733
> URL: https://issues.apache.org/jira/browse/HIVE-27733
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0-beta-1
>Reporter: Henri Biestro
>Assignee: Henri Biestro
>Priority: Major
>  Labels: pull-request-available
>
> Some tests sporadically fail with a cause that looks like:
> {code}
> Caused by: java.util.ConcurrentModificationException
>   at java.util.HashMap$HashIterator.nextNode(HashMap.java:1493) ~[?:?]
>   at java.util.HashMap$EntryIterator.next(HashMap.java:1526) ~[?:?]
>   at java.util.HashMap$EntryIterator.next(HashMap.java:1524) ~[?:?]
>   at java.util.AbstractCollection.toArray(AbstractCollection.java:200) 
> ~[?:?]
>   at com.google.common.collect.Iterables.toArray(Iterables.java:285) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:451) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:436) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at 
> org.apache.hadoop.hive.ql.log.PerfLogger.getEndTimes(PerfLogger.java:227) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:629) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:560) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:554) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127)
>  ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:200)
>  ~[hive-service-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   ... 51 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-27733) Intermittent ConcurrentModificationException in HiveServer2

2023-09-26 Thread Henri Biestro (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17769153#comment-17769153
 ] 

Henri Biestro edited comment on HIVE-27733 at 9/26/23 12:28 PM:


Indeed, it is. But the code still encounters the race condition due to sharing 
the PerfLogger between threads. The problem still potentially occurs.


was (Author: henrib):
Indeed, it is. But the code still encounters the race condition due to sharing 
the PerfLogger between threads.

> Intermittent ConcurrentModificationException in HiveServer2
> ---
>
> Key: HIVE-27733
> URL: https://issues.apache.org/jira/browse/HIVE-27733
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0-beta-1
>Reporter: Henri Biestro
>Assignee: Henri Biestro
>Priority: Major
>
> Some tests sporadically fail with a cause that looks like:
> {code}
> Caused by: java.util.ConcurrentModificationException
>   at java.util.HashMap$HashIterator.nextNode(HashMap.java:1493) ~[?:?]
>   at java.util.HashMap$EntryIterator.next(HashMap.java:1526) ~[?:?]
>   at java.util.HashMap$EntryIterator.next(HashMap.java:1524) ~[?:?]
>   at java.util.AbstractCollection.toArray(AbstractCollection.java:200) 
> ~[?:?]
>   at com.google.common.collect.Iterables.toArray(Iterables.java:285) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:451) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:436) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at 
> org.apache.hadoop.hive.ql.log.PerfLogger.getEndTimes(PerfLogger.java:227) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:629) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:560) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:554) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127)
>  ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:200)
>  ~[hive-service-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   ... 51 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27733) Intermittent ConcurrentModificationException in HiveServer2

2023-09-26 Thread Henri Biestro (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17769153#comment-17769153
 ] 

Henri Biestro commented on HIVE-27733:
--

Indeed, it is. But the code still encounters the race condition due to sharing 
the PerfLogger between threads.

> Intermittent ConcurrentModificationException in HiveServer2
> ---
>
> Key: HIVE-27733
> URL: https://issues.apache.org/jira/browse/HIVE-27733
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0-beta-1
>Reporter: Henri Biestro
>Assignee: Henri Biestro
>Priority: Major
>
> Some tests sporadically fail with a cause that looks like:
> {code}
> Caused by: java.util.ConcurrentModificationException
>   at java.util.HashMap$HashIterator.nextNode(HashMap.java:1493) ~[?:?]
>   at java.util.HashMap$EntryIterator.next(HashMap.java:1526) ~[?:?]
>   at java.util.HashMap$EntryIterator.next(HashMap.java:1524) ~[?:?]
>   at java.util.AbstractCollection.toArray(AbstractCollection.java:200) 
> ~[?:?]
>   at com.google.common.collect.Iterables.toArray(Iterables.java:285) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:451) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:436) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at 
> org.apache.hadoop.hive.ql.log.PerfLogger.getEndTimes(PerfLogger.java:227) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:629) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:560) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:554) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127)
>  ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:200)
>  ~[hive-service-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   ... 51 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27737) Consider extending HIVE-17574 to aux jars

2023-09-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-27737:

Description: 
[HIVE-17574|https://github.com/apache/hive/commit/26753ade2a130339940119c950b9c9af53e3d024]
 was about an optimization, where HDFS-based resources optionally were 
localized directly from the "original" hdfs folder instead of a tez session 
dir. This reduced the HDFS overhead, by introducing 
hive.resource.use.hdfs.location, so there are 2 cases:

1. hive.resource.use.hdfs.location=true
a) collect "HDFS temp files" and optimize their access: added files, added jars
b) collect local temp files and use the non-optimized session-based approach: 
added files, added jars, aux jars, reloadable aux jars

{code}
  // reference HDFS based resource directly, to use distribute cache 
efficiently.
  addHdfsResource(conf, tmpResources, LocalResourceType.FILE, 
getHdfsTempFilesFromConf(conf));
  // local resources are session based.
  tmpResources.addAll(
  addTempResources(conf, hdfsDirPathStr, LocalResourceType.FILE,
  getLocalTempFilesFromConf(conf), null).values()
  );
{code}

2. hive.resource.use.hdfs.location=false
a) original behavior: collect all jars in hs2's scope (added files, added jars, 
aux jars, reloadable aux jars) and put it to a session based directory
{code}
  // all resources including HDFS are session based.
  tmpResources.addAll(
  addTempResources(conf, hdfsDirPathStr, LocalResourceType.FILE,
  getTempFilesFromConf(conf), null).values()
  );
{code}

my proposal is related to 1)
let's say user is about to load an aux jar from hdfs and have it set in 
hive.aux.jars.path:
{code}
hive.aux.jars.path=file:///opt/some_local_jar.jar,hdfs:///tmp/some_distributed.jar
{code}

in this case: we can distinguish between file:// scheme resources and hdfs:// 
scheme resources:
- file scheme resources should fall into 1b), still be used from session dir
- hdfs scheme resources should fall into 1a), simply used by addHdfsResource

 
this needs a bit of attention at every usages of aux jars, because aux jars are 
e.g. supposed to be classloaded to HS2 sessions, so in case of an hdfs 
resource, it should be taken care of

  was:
[HIVE-17574|https://github.com/apache/hive/commit/26753ade2a130339940119c950b9c9af53e3d024]
 was about an optimization, where HDFS-based resources optionally were 
localized directly from the "original" hdfs folder instead of a tez session 
dir. This reduced the HDFS overhead, by introducing 
hive.resource.use.hdfs.location, so there are 2 cases:

1. hive.resource.use.hdfs.location=true
a) collect "HDFS temp files" and optimize their access: added files, added jars
b) collect local temp files and use the non-optimized session-based approach: 
added files, added jars, aux jars, reloadable aux jars

{code}
  // reference HDFS based resource directly, to use distribute cache 
efficiently.
  addHdfsResource(conf, tmpResources, LocalResourceType.FILE, 
getHdfsTempFilesFromConf(conf));
  // local resources are session based.
  tmpResources.addAll(
  addTempResources(conf, hdfsDirPathStr, LocalResourceType.FILE,
  getLocalTempFilesFromConf(conf), null).values()
  );
{code}

2. hive.resource.use.hdfs.location=false
a) original behavior: collect all jars in hs2's scope (added files, added jars, 
aux jars, reloadable aux jars) and put it to a session based directory
{code}
  // all resources including HDFS are session based.
  tmpResources.addAll(
  addTempResources(conf, hdfsDirPathStr, LocalResourceType.FILE,
  getTempFilesFromConf(conf), null).values()
  );
{code}

my proposal is related to 1)
let's say user is about to load an aux jar from hdfs and have it set in 
hive.aux.jars.path:
{code}
hive.aux.jars.path=file:///opt/some_local_jar.jar,hdfs:///tmp/some_distributed.jar
{code}

in this case: we can distinguish between file:// scheme resources and hdfs:// 
scheme resources:
- file scheme resources should fall into 1b), still be used from session dir
- hdfs scheme resources should fall into 1a), simply used by addHdfsResource

 


> Consider extending HIVE-17574 to aux jars
> -
>
> Key: HIVE-27737
> URL: https://issues.apache.org/jira/browse/HIVE-27737
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Priority: Major
>
> [HIVE-17574|https://github.com/apache/hive/commit/26753ade2a130339940119c950b9c9af53e3d024]
>  was about an optimization, where HDFS-based resources optionally were 
> localized directly from the "original" hdfs folder instead of a tez session 
> dir. This reduced the HDFS overhead, by introducing 
> hive.resource.use.hdfs.location, so there are 2 cases:
> 1. hive.resource.use.hdfs.location=true
> a) 

[jira] [Updated] (HIVE-27737) Consider extending HIVE-17574 to aux jars

2023-09-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-27737:

Description: 
[HIVE-17574|https://github.com/apache/hive/commit/26753ade2a130339940119c950b9c9af53e3d024]
 was about an optimization, where HDFS-based resources optionally were 
localized directly from the "original" hdfs folder instead of a tez session 
dir. This reduced the HDFS overhead, by introducing 
hive.resource.use.hdfs.location, so there are 2 cases:

1. hive.resource.use.hdfs.location=true
a) collect "HDFS temp files" and optimize their access: added files, added jars
b) collect local temp files and use the non-optimized session-based approach: 
added files, added jars, aux jars, reloadable aux jars

{code}
  // reference HDFS based resource directly, to use distribute cache 
efficiently.
  addHdfsResource(conf, tmpResources, LocalResourceType.FILE, 
getHdfsTempFilesFromConf(conf));
  // local resources are session based.
  tmpResources.addAll(
  addTempResources(conf, hdfsDirPathStr, LocalResourceType.FILE,
  getLocalTempFilesFromConf(conf), null).values()
  );
{code}

2. hive.resource.use.hdfs.location=false
a) original behavior: collect all jars in hs2's scope (added files, added jars, 
aux jars, reloadable aux jars) and put it to a session based directory
{code}
  // all resources including HDFS are session based.
  tmpResources.addAll(
  addTempResources(conf, hdfsDirPathStr, LocalResourceType.FILE,
  getTempFilesFromConf(conf), null).values()
  );
{code}

my proposal is related to 1)
let's say user is about to load an aux jar from hdfs and have it set in 
hive.aux.jars.path:
{code}
hive.aux.jars.path=file:///opt/some_local_jar.jar,hdfs:///tmp/some_distributed.jar
{code}

in this case: we can distinguish between file:// scheme resources and hdfs:// 
scheme resources:
- file scheme resources should fall into 1b), still be used from session dir
- hdfs scheme resources should fall into 1a), simply used by addHdfsResource

 

> Consider extending HIVE-17574 to aux jars
> -
>
> Key: HIVE-27737
> URL: https://issues.apache.org/jira/browse/HIVE-27737
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Priority: Major
>
> [HIVE-17574|https://github.com/apache/hive/commit/26753ade2a130339940119c950b9c9af53e3d024]
>  was about an optimization, where HDFS-based resources optionally were 
> localized directly from the "original" hdfs folder instead of a tez session 
> dir. This reduced the HDFS overhead, by introducing 
> hive.resource.use.hdfs.location, so there are 2 cases:
> 1. hive.resource.use.hdfs.location=true
> a) collect "HDFS temp files" and optimize their access: added files, added 
> jars
> b) collect local temp files and use the non-optimized session-based approach: 
> added files, added jars, aux jars, reloadable aux jars
> {code}
>   // reference HDFS based resource directly, to use distribute cache 
> efficiently.
>   addHdfsResource(conf, tmpResources, LocalResourceType.FILE, 
> getHdfsTempFilesFromConf(conf));
>   // local resources are session based.
>   tmpResources.addAll(
>   addTempResources(conf, hdfsDirPathStr, LocalResourceType.FILE,
>   getLocalTempFilesFromConf(conf), null).values()
>   );
> {code}
> 2. hive.resource.use.hdfs.location=false
> a) original behavior: collect all jars in hs2's scope (added files, added 
> jars, aux jars, reloadable aux jars) and put it to a session based directory
> {code}
>   // all resources including HDFS are session based.
>   tmpResources.addAll(
>   addTempResources(conf, hdfsDirPathStr, LocalResourceType.FILE,
>   getTempFilesFromConf(conf), null).values()
>   );
> {code}
> my proposal is related to 1)
> let's say user is about to load an aux jar from hdfs and have it set in 
> hive.aux.jars.path:
> {code}
> hive.aux.jars.path=file:///opt/some_local_jar.jar,hdfs:///tmp/some_distributed.jar
> {code}
> in this case: we can distinguish between file:// scheme resources and hdfs:// 
> scheme resources:
> - file scheme resources should fall into 1b), still be used from session dir
> - hdfs scheme resources should fall into 1a), simply used by addHdfsResource
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27737) Extending HIVE-17574 to aux jars

2023-09-26 Thread Jira
László Bodor created HIVE-27737:
---

 Summary: Extending HIVE-17574 to aux jars
 Key: HIVE-27737
 URL: https://issues.apache.org/jira/browse/HIVE-27737
 Project: Hive
  Issue Type: Improvement
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27737) Consider extending HIVE-17574 to aux jars

2023-09-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-27737:

Summary: Consider extending HIVE-17574 to aux jars  (was: Extending 
HIVE-17574 to aux jars)

> Consider extending HIVE-17574 to aux jars
> -
>
> Key: HIVE-27737
> URL: https://issues.apache.org/jira/browse/HIVE-27737
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27722) Added org.bouncycastle as dependency and excluded the dependencies of the same which lead to version mismatch while running tests

2023-09-26 Thread Sankar Hariappan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan resolved HIVE-27722.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

> Added org.bouncycastle as dependency and excluded the dependencies of the 
> same which lead to version mismatch while running tests
> -
>
> Key: HIVE-27722
> URL: https://issues.apache.org/jira/browse/HIVE-27722
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 3.2.0
>Reporter: Aman Raj
>Assignee: Aman Raj
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27704) Remove PowerMock from jdbc-handler and upgrade mockito to 4.11

2023-09-26 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HIVE-27704:
---

Assignee: KIRTI RUGE

> Remove PowerMock from jdbc-handler and upgrade mockito to 4.11
> --
>
> Key: HIVE-27704
> URL: https://issues.apache.org/jira/browse/HIVE-27704
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Zsolt Miskolczi
>Assignee: KIRTI RUGE
>Priority: Major
>  Labels: newbie, pull-request-available, starter
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27736) Remove PowerMock from itests-jmh and upgrade mockito

2023-09-26 Thread Ayush Saxena (Jira)
Ayush Saxena created HIVE-27736:
---

 Summary: Remove PowerMock from itests-jmh and upgrade mockito
 Key: HIVE-27736
 URL: https://issues.apache.org/jira/browse/HIVE-27736
 Project: Hive
  Issue Type: Sub-task
Reporter: Ayush Saxena
Assignee: Zsolt Miskolczi


Remove power mock from itests-jmh module



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27701) Remove PowerMock from llap-client and upgrade mockito to 4.11

2023-09-26 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-27701:

Parent: HIVE-27735
Issue Type: Sub-task  (was: Task)

> Remove PowerMock from llap-client and upgrade mockito to 4.11
> -
>
> Key: HIVE-27701
> URL: https://issues.apache.org/jira/browse/HIVE-27701
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Zsolt Miskolczi
>Priority: Major
>  Labels: newbie, starter
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27704) Remove PowerMock from jdbc-handler and upgrade mockito to 4.11

2023-09-26 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-27704:

Parent: HIVE-27735
Issue Type: Sub-task  (was: Task)

> Remove PowerMock from jdbc-handler and upgrade mockito to 4.11
> --
>
> Key: HIVE-27704
> URL: https://issues.apache.org/jira/browse/HIVE-27704
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Zsolt Miskolczi
>Priority: Major
>  Labels: newbie, pull-request-available, starter
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27705) Remove PowerMock from service (hive-service) and upgrade mockito to 4.11

2023-09-26 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-27705:

Parent: HIVE-27735
Issue Type: Sub-task  (was: Task)

> Remove PowerMock from service (hive-service) and upgrade mockito to 4.11
> 
>
> Key: HIVE-27705
> URL: https://issues.apache.org/jira/browse/HIVE-27705
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Zsolt Miskolczi
>Priority: Major
>  Labels: newbie, starter
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27702) Remove PowerMock from beeline and upgrade mockito to 4.11

2023-09-26 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-27702:

Parent: HIVE-27735
Issue Type: Sub-task  (was: Task)

> Remove PowerMock from beeline and upgrade mockito to 4.11
> -
>
> Key: HIVE-27702
> URL: https://issues.apache.org/jira/browse/HIVE-27702
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Zsolt Miskolczi
>Assignee: Mayank Kunwar
>Priority: Major
>  Labels: newbie, pull-request-available, starter
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26455) Remove PowerMockito from hive-exec

2023-09-26 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-26455:

Parent: HIVE-27735
Issue Type: Sub-task  (was: Improvement)

> Remove PowerMockito from hive-exec
> --
>
> Key: HIVE-26455
> URL: https://issues.apache.org/jira/browse/HIVE-26455
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Zsolt Miskolczi
>Assignee: Zsolt Miskolczi
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> PowerMockito is a mockito extension that introduces some painful points. 
> The main intention behind that is to be able to do static mocking. Since its 
> release, mockito-inline has been released, as a part of the mockito-core. 
> It doesn't require vintage test runner to be able to run and it can mock 
> objects with their own thread. 
> The goal is to stop using PowerMockito and use mockito-inline instead.
>  
> The affected packages are: 
>  * org.apache.hadoop.hive.ql.exec.repl
>  * org.apache.hadoop.hive.ql.exec.repl.bootstrap.load
>  * org.apache.hadoop.hive.ql.exec.repl.ranger;
>  * org.apache.hadoop.hive.ql.exec.util
>  * org.apache.hadoop.hive.ql.parse.repl
>  * org.apache.hadoop.hive.ql.parse.repl.load.message
>  * org.apache.hadoop.hive.ql.parse.repl.metric
>  * org.apache.hadoop.hive.ql.txn.compactor
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27735) Remove Powermock

2023-09-26 Thread Ayush Saxena (Jira)
Ayush Saxena created HIVE-27735:
---

 Summary: Remove Powermock
 Key: HIVE-27735
 URL: https://issues.apache.org/jira/browse/HIVE-27735
 Project: Hive
  Issue Type: Task
Reporter: Ayush Saxena


Powermock has compatibility issues with higher java versions & can be replaced 
with mockito



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27734) Add Icenerg's storage-partitioned join capabilities to Hive's [sorted-]bucket-map-join

2023-09-26 Thread Janos Kovacs (Jira)
Janos Kovacs created HIVE-27734:
---

 Summary: Add Icenerg's storage-partitioned join capabilities to 
Hive's [sorted-]bucket-map-join
 Key: HIVE-27734
 URL: https://issues.apache.org/jira/browse/HIVE-27734
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration
Affects Versions: 4.0.0-alpha-2
Reporter: Janos Kovacs


Iceberg's 'data bucketing' is implemented through its rich (function based) 
partitioning feature which helps to optimize join operations - called storage 
partitioned joins. 

doc: 
[https://docs.google.com/document/d/1foTkDSM91VxKgkEcBMsuAvEjNybjja-uHk-r3vtXWFE/edit#heading=h.82w8qxfl2uwl]
spark impl.: https://issues.apache.org/jira/browse/SPARK-37375

This feature is not yet leveraged in Hive into its bucket-map-join 
optimization, neither alone nor with Iceberg's SortOrder to 
sorted-bucket-map-join.
Customers migrating from Hive table format to Iceberg format with storage 
optimized schema will experience performance degradation on large tables where 
Iceberg's gain on no-listing performance improvement is significantly smaller 
than the actual join performance over bucket-join or even sorted-bucket-join.
 
{noformat}
SET hive.query.results.cache.enabled=false;
SET hive.fetch.task.conversion = none;
SET hive.optimize.bucketmapjoin=true;
SET hive.convert.join.bucket.mapjoin.tez=true;
SET hive.auto.convert.join.noconditionaltask.size=1000;
--if you are working with external table, you need this for bmj:
SET hive.disable.unsafe.external.table.operations=false;


-- HIVE BUCKET-MAP-JOIN
DROP TABLE IF EXISTS default.hivebmjt1 PURGE;
DROP TABLE IF EXISTS default.hivebmjt2 PURGE;
CREATE TABLE default.hivebmjt1 (id int, txt string) CLUSTERED BY (id) INTO 8 
BUCKETS;
CREATE TABLE default.hivebmjt2 (id int, txt string);
INSERT INTO default.hivebmjt1 VALUES 
(1,'1'),(2,'2'),(3,'3'),(4,'4'),(5,'5'),(6,'6'),(7,'7'),(8,'8');
INSERT INTO default.hivebmjt2 VALUES (1,'1'),(2,'2'),(3,'3'),(4,'4');

EXPLAIN
SELECT * FROM default.hivebmjt1 f INNER  JOIN default.hivebmjt2 d ON f.id = 
d.id;
EXPLAIN
SELECT * FROM default.hivebmjt1 f LEFT OUTER JOIN default.hivebmjt2 d ON f.id = 
d.id;
-- Both are optimized into BMJ


-- ICEBERG BUCKET-MAP-JOIN via Iceberg's storage-partitioned join
DROP TABLE IF EXISTS default.icespbmjt1 PURGE;
DROP TABLE IF EXISTS default.icespbmjt2 PURGE;
CREATE TABLE default.icespbmjt1 (txt string) PARTITIONED BY (id int) STORED BY 
ICEBERG ;
CREATE TABLE default.icespbmjt2 (txt string) PARTITIONED BY (id int) STORED BY 
ICEBERG ;
INSERT INTO default.icespbmjt1 VALUES ('1',1),('2',2),('3',3),('4',4);
INSERT INTO default.icespbmjt2 VALUES ('1',1),('2',2),('3',3),('4',4);

EXPLAIN
SELECT * FROM default.icespbmjt1 f INNER  JOIN default.icespbmjt2 d ON f.id 
= d.id;
EXPLAIN
SELECT * FROM default.icespbmjt1 f LEFT OUTER JOIN default.icespbmjt2 d ON f.id 
= d.id;
-- Only Map-Join optimised
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27733) Intermittent ConcurrentModificationException in HiveServer2

2023-09-26 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17769064#comment-17769064
 ] 

Stamatis Zampetakis commented on HIVE-27733:


This seems to be a duplicate of HIVE-18928, HIVE-19133.

> Intermittent ConcurrentModificationException in HiveServer2
> ---
>
> Key: HIVE-27733
> URL: https://issues.apache.org/jira/browse/HIVE-27733
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0-beta-1
>Reporter: Henri Biestro
>Assignee: Henri Biestro
>Priority: Major
>
> Some tests sporadically fail with a cause that looks like:
> {code}
> Caused by: java.util.ConcurrentModificationException
>   at java.util.HashMap$HashIterator.nextNode(HashMap.java:1493) ~[?:?]
>   at java.util.HashMap$EntryIterator.next(HashMap.java:1526) ~[?:?]
>   at java.util.HashMap$EntryIterator.next(HashMap.java:1524) ~[?:?]
>   at java.util.AbstractCollection.toArray(AbstractCollection.java:200) 
> ~[?:?]
>   at com.google.common.collect.Iterables.toArray(Iterables.java:285) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:451) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:436) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at 
> org.apache.hadoop.hive.ql.log.PerfLogger.getEndTimes(PerfLogger.java:227) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:629) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:560) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:554) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127)
>  ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:200)
>  ~[hive-service-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   ... 51 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27733) Intermittent ConcurrentModificationException in HiveServer2

2023-09-26 Thread Henri Biestro (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henri Biestro updated HIVE-27733:
-
Description: 
Some tests sporadically fail with a cause that looks like:
{code}
Caused by: java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1493) ~[?:?]
at java.util.HashMap$EntryIterator.next(HashMap.java:1526) ~[?:?]
at java.util.HashMap$EntryIterator.next(HashMap.java:1524) ~[?:?]
at java.util.AbstractCollection.toArray(AbstractCollection.java:200) 
~[?:?]
at com.google.common.collect.Iterables.toArray(Iterables.java:285) 
~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:451) 
~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:436) 
~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
at 
org.apache.hadoop.hive.ql.log.PerfLogger.getEndTimes(PerfLogger.java:227) 
~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:629) 
~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:560) 
~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:554) 
~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127)
 ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:200)
 ~[hive-service-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
... 51 more
{code}

  was:
Some tests sporadically fail with a cause that looks like:

Caused by: java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1493) ~[?:?]
at java.util.HashMap$EntryIterator.next(HashMap.java:1526) ~[?:?]
at java.util.HashMap$EntryIterator.next(HashMap.java:1524) ~[?:?]
at java.util.AbstractCollection.toArray(AbstractCollection.java:200) 
~[?:?]
at com.google.common.collect.Iterables.toArray(Iterables.java:285) 
~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:451) 
~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:436) 
~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
at 
org.apache.hadoop.hive.ql.log.PerfLogger.getEndTimes(PerfLogger.java:227) 
~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:629) 
~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:560) 
~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:554) 
~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127)
 ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:200)
 ~[hive-service-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
... 51 more


> Intermittent ConcurrentModificationException in HiveServer2
> ---
>
> Key: HIVE-27733
> URL: https://issues.apache.org/jira/browse/HIVE-27733
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0-beta-1
>Reporter: Henri Biestro
>Assignee: Henri Biestro
>Priority: Major
>
> Some tests sporadically fail with a cause that looks like:
> {code}
> Caused by: java.util.ConcurrentModificationException
>   at java.util.HashMap$HashIterator.nextNode(HashMap.java:1493) ~[?:?]
>   at java.util.HashMap$EntryIterator.next(HashMap.java:1526) ~[?:?]
>   at java.util.HashMap$EntryIterator.next(HashMap.java:1524) ~[?:?]
>   at java.util.AbstractCollection.toArray(AbstractCollection.java:200) 
> ~[?:?]
>   at com.google.common.collect.Iterables.toArray(Iterables.java:285) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:451) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:436) 
> 

[jira] [Commented] (HIVE-27733) Intermittent ConcurrentModificationException in HiveServer2

2023-09-26 Thread Henri Biestro (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17769055#comment-17769055
 ] 

Henri Biestro commented on HIVE-27733:
--

It seems to indicate the PerfLogger endTimes map gets modified concurrently 
with its copy; note that that class is not thread-safe since it is expected 
(originally) to be used through a ThreadLocal and static methods. However, the 
method PerfLogger.setPerfLogger sets that thread-local member, is called by 
SQLOperation.BackgroundWork and shares the (parent) instance of the PerfLogger, 
essentially creating a potential race condition between parent and background 
workers. I'd suggest using a ConcurrentHashMap instead of an HashMap as a blind 
fix for the internal endTimes/startTimes.
...

> Intermittent ConcurrentModificationException in HiveServer2
> ---
>
> Key: HIVE-27733
> URL: https://issues.apache.org/jira/browse/HIVE-27733
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0-beta-1
>Reporter: Henri Biestro
>Assignee: Henri Biestro
>Priority: Major
>
> Some tests sporadically fail with a cause that looks like:
> Caused by: java.util.ConcurrentModificationException
>   at java.util.HashMap$HashIterator.nextNode(HashMap.java:1493) ~[?:?]
>   at java.util.HashMap$EntryIterator.next(HashMap.java:1526) ~[?:?]
>   at java.util.HashMap$EntryIterator.next(HashMap.java:1524) ~[?:?]
>   at java.util.AbstractCollection.toArray(AbstractCollection.java:200) 
> ~[?:?]
>   at com.google.common.collect.Iterables.toArray(Iterables.java:285) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:451) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:436) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at 
> org.apache.hadoop.hive.ql.log.PerfLogger.getEndTimes(PerfLogger.java:227) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:629) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:560) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:554) 
> ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127)
>  ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:200)
>  ~[hive-service-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
>   ... 51 more



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27733) Intermittent ConcurrentModificationException in HiveServer2

2023-09-26 Thread Henri Biestro (Jira)
Henri Biestro created HIVE-27733:


 Summary: Intermittent ConcurrentModificationException in 
HiveServer2
 Key: HIVE-27733
 URL: https://issues.apache.org/jira/browse/HIVE-27733
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 4.0.0-beta-1
Reporter: Henri Biestro
Assignee: Henri Biestro


Some tests sporadically fail with a cause that looks like:

Caused by: java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1493) ~[?:?]
at java.util.HashMap$EntryIterator.next(HashMap.java:1526) ~[?:?]
at java.util.HashMap$EntryIterator.next(HashMap.java:1524) ~[?:?]
at java.util.AbstractCollection.toArray(AbstractCollection.java:200) 
~[?:?]
at com.google.common.collect.Iterables.toArray(Iterables.java:285) 
~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:451) 
~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:436) 
~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
at 
org.apache.hadoop.hive.ql.log.PerfLogger.getEndTimes(PerfLogger.java:227) 
~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:629) 
~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:560) 
~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:554) 
~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127)
 ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:200)
 ~[hive-service-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774]
... 51 more



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27731) Perform metadata delete when only static filters are present

2023-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27731:
--
Labels: pull-request-available  (was: )

> Perform metadata delete when only static filters are present
> 
>
> Key: HIVE-27731
> URL: https://issues.apache.org/jira/browse/HIVE-27731
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>
> When the query has static filters only, try to perform a metadata delete 
> directly rather than moving forward with positional delete.
> Some relevant use cases where metadata deletes can be used - 
> {code:java}
> DELETE FROM ice_table where id = 1;{code}
> As seen above only filter is (id = 1). If in scenarios wherein the filter 
> corresponds to a partition column then metadata delete is more efficient and 
> does not generate additional files.
> For partition evolution cases, if it is not possible to perform metadata 
> delete then positional delete is done.
> Some other optimisations that can be seen here is utilizing vectorized 
> expressions for UDFs which provide vectorized expressions such as year - 
> {code:java}
> DELETE FROM ice_table where id = 1 AND year(datecol) = 2015;{code}
> Delete query with Multi-table scans will not optimized using this method 
> since determination of where clauses happens at runtime.
> A similar optimisation is seen in Spark where metadata delete is done 
> whenever possible- 
> [https://github.com/apache/iceberg/blob/master/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java#L297-L389]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27732) Backward compatibility for Hive with Components like Spark

2023-09-26 Thread Aman Raj (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Raj updated HIVE-27732:

Description: 
Added additional functions for OSS Spark 3.3 and HMS 3.1.2 Compatibility.
These are the functions used by Spark when it integrates with Hive :
 
List of all functions called by HiveShim.scala. This can be found in Spark 3.3 
codebase.
1. hive.dropDatabase(dbName, deleteData, ignoreUnknownDb, cascade)
2. hive.alterDatabase(dbName, d)
3. hive.getDatabase(dbName)
4. hive.getAllDatabases.asScala.toSeq
5. hive.getDatabasesByPattern(pattern).asScala.toSeq
6. hive.databaseExists(dbName)
7. getAllPartitionsMethod.invoke(hive, table)
8. getPartitionsByFilterMethod.invoke(hive, table, filter)
9. alterTableMethod.invoke(hive, tableName, table, 
environmentContextInAlterTable)
10. alterPartitionsMethod.invoke(hive, tableName, newParts, 
environmentContextInAlterTable)
11. hive.createTable(table, ifNotExists)
12. hive.getTable(database, tableName)
13. hive.getTable(dbName, tableName, throwException)
14. hive.getTable(tableName)
15. getTablesByTypeMethod.invoke(hive, dbName, pattern, tableType)
16. hive.getTablesByPattern(dbName, pattern).asScala.toSeq
17. hive.getAllTables(dbName).asScala.toSeq
18. hive.dropTable(dbName, tableName, deleteData, ignoreIfNotExists)
19. hive.dropTable(dbName, tableName)
20. dropTableMethod.invoke(hive, dbName, tableName, deleteData: 
JBoolean,ignoreIfNotExists: JBoolean, purge: JBoolean)
21. hive.getPartition(table, partSpec, forceCreate)
22. hive.getPartitions(table, partSpec).asScala.toSeq
23. hive.getPartitionNames(dbName, tableName, max).asScala.toSeq
24. hive.getPartitionNames(dbName, tableName, partSpec, max).asScala.toSeq
25. createPartitionMethod.invoke(
  hive,
  table,
  spec,
  location,
  params, // partParams
  null, // inputFormat
  null, // outputFormat
  -1: JInteger, // numBuckets
  null, // cols
  null, // serializationLib
  null, // serdeParams
  null, // bucketCols
  null) // sortCols
  }
26. hive.createPartitions(addPartitionDesc)
27. loadPartitionMethod.invoke(hive, loadPath, tableName, partSpec, replace: 
JBoolean,
  inheritTableSpecs: JBoolean, isSkewedStoreAsSubdir: JBoolean,
  isSrcLocal: JBoolean, isAcid, hasFollowingStatsTask)
28. hive.renamePartition(table, oldPartSpec, newPart)
29. loadTableMethod.invoke(hive, loadPath, tableName, loadFileType.get, 
isSrcLocal: JBoolean,
  isSkewedStoreAsSubdir, isAcidIUDoperation, hasFollowingStatsTask,
  writeIdInLoadTableOrPartition, stmtIdInLoadTableOrPartition: JInteger, 
replace: JBoolean)

 

30. loadDynamicPartitionsMethod.invoke(hive, loadPath, tableName, partSpec, 
loadFileType.get,
  numDP: JInteger, listBucketingLevel, isAcid, 
writeIdInLoadTableOrPartition,
  stmtIdInLoadTableOrPartition, hasFollowingStatsTask, 
AcidUtils.Operation.NOT_ACID,
  replace: JBoolean)
31. hive.createFunction(toHiveFunction(func, db))
32. hive.dropFunction(db, name)
33. hive.alterFunction(db, oldName, hiveFunc)
34. hive.getFunctions(db, pattern).asScala.toSeq
35. dropIndexMethod.invoke(hive, dbName, tableName, indexName, 
throwExceptionInDropIndex,
  deleteDataInDropIndex)

> Backward compatibility for Hive with Components like Spark
> --
>
> Key: HIVE-27732
> URL: https://issues.apache.org/jira/browse/HIVE-27732
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 3.2.0
>Reporter: Aman Raj
>Assignee: Aman Raj
>Priority: Major
>
> Added additional functions for OSS Spark 3.3 and HMS 3.1.2 Compatibility.
> These are the functions used by Spark when it integrates with Hive :
>  
> List of all functions called by HiveShim.scala. This can be found in Spark 
> 3.3 codebase.
> 1. hive.dropDatabase(dbName, deleteData, ignoreUnknownDb, cascade)
> 2. hive.alterDatabase(dbName, d)
> 3. hive.getDatabase(dbName)
> 4. hive.getAllDatabases.asScala.toSeq
> 5. hive.getDatabasesByPattern(pattern).asScala.toSeq
> 6. hive.databaseExists(dbName)
> 7. getAllPartitionsMethod.invoke(hive, table)
> 8. getPartitionsByFilterMethod.invoke(hive, table, filter)
> 9. alterTableMethod.invoke(hive, tableName, table, 
> environmentContextInAlterTable)
> 10. alterPartitionsMethod.invoke(hive, tableName, newParts, 
> environmentContextInAlterTable)
> 11. hive.createTable(table, ifNotExists)
> 12. hive.getTable(database, tableName)
> 13. hive.getTable(dbName, tableName, throwException)
> 14. hive.getTable(tableName)
> 15. getTablesByTypeMethod.invoke(hive, dbName, pattern, tableType)
> 16. hive.getTablesByPattern(dbName, pattern).asScala.toSeq
> 17. hive.getAllTables(dbName).asScala.toSeq
> 18. hive.dropTable(dbName, tableName, deleteData, ignoreIfNotExists)
> 19. 

[jira] [Created] (HIVE-27732) Backward compatibility for Hive with Components like Spark

2023-09-26 Thread Aman Raj (Jira)
Aman Raj created HIVE-27732:
---

 Summary: Backward compatibility for Hive with Components like Spark
 Key: HIVE-27732
 URL: https://issues.apache.org/jira/browse/HIVE-27732
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 3.2.0
Reporter: Aman Raj
Assignee: Aman Raj






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27731) Perform metadata delete when only static filters are present

2023-09-26 Thread Sourabh Badhya (Jira)
Sourabh Badhya created HIVE-27731:
-

 Summary: Perform metadata delete when only static filters are 
present
 Key: HIVE-27731
 URL: https://issues.apache.org/jira/browse/HIVE-27731
 Project: Hive
  Issue Type: Improvement
Reporter: Sourabh Badhya
Assignee: Sourabh Badhya


When the query has static filters only, try to perform a metadata delete 
directly rather than moving forward with positional delete.

Some relevant use cases where metadata deletes can be used - 
{code:java}
DELETE FROM ice_table where id = 1;{code}
As seen above only filter is (id = 1). If in scenarios wherein the filter 
corresponds to a partition column then metadata delete is more efficient and 
does not generate additional files.

For partition evolution cases, if it is not possible to perform metadata delete 
then positional delete is done.

Some other optimisations that can be seen here is utilizing vectorized 
expressions for UDFs which provide vectorized expressions such as year - 
{code:java}
DELETE FROM ice_table where id = 1 AND year(datecol) = 2015;{code}
Delete query with Multi-table scans will not optimized using this method since 
determination of where clauses happens at runtime.

A similar optimisation is seen in Spark where metadata delete is done whenever 
possible- 
[https://github.com/apache/iceberg/blob/master/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java#L297-L389]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-21100) Allow flattening of table subdirectories resulted when using TEZ engine and UNION clause

2023-09-26 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-21100:

Priority: Major  (was: Minor)

> Allow flattening of table subdirectories resulted when using TEZ engine and 
> UNION clause
> 
>
> Key: HIVE-21100
> URL: https://issues.apache.org/jira/browse/HIVE-21100
> Project: Hive
>  Issue Type: Improvement
>Reporter: George Pachitariu
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21100.1.patch, HIVE-21100.2.patch, 
> HIVE-21100.3.patch, HIVE-21100.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Right now, when writing data into a table with Tez engine and the clause 
> UNION ALL is the last step of the query, Hive on Tez will create a 
> subdirectory for each branch of the UNION ALL.
> With this patch the subdirectories are removed, and the files are renamed and 
> moved to the parent directory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27730) Bump org.xerial.snappy:snappy-java from 1.1.10.1 to 1.1.10.4

2023-09-26 Thread Ayush Saxena (Jira)
Ayush Saxena created HIVE-27730:
---

 Summary: Bump org.xerial.snappy:snappy-java from 1.1.10.1 to 
1.1.10.4
 Key: HIVE-27730
 URL: https://issues.apache.org/jira/browse/HIVE-27730
 Project: Hive
  Issue Type: Bug
Reporter: Ayush Saxena


PR from [dependabot|https://github.com/apps/dependabot]: 

[https://github.com/apache/hive/pull/4746]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)