[jira] [Updated] (HIVE-27744) privileges check is skipped when using partly dynamic partition write.
[ https://issues.apache.org/jira/browse/HIVE-27744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shuaiqi.guo updated HIVE-27744: --- Description: the privileges check will be skiped when using dynamic partition write with part of the partition specified, just like the following example: {code:java} insert overwrite table test_privilege partition (`date` = '2023-09-27', hour) ... {code} hive will execute it directly without checking write privileges. use the following patch to fix this bug. was: the privileges check will be skiped when using dynamic partition write with part of the partition specified, just like the following example: {code:java} insert overwrite table test_privilege partition (`date` = '2023-09-27', hour) ... {code} hive will execute it directly without checking write privileges. > privileges check is skipped when using partly dynamic partition write. > -- > > Key: HIVE-27744 > URL: https://issues.apache.org/jira/browse/HIVE-27744 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: All Versions >Reporter: shuaiqi.guo >Priority: Blocker > Fix For: 2.3.5 > > Attachments: HIVE-27744.patch > > > the privileges check will be skiped when using dynamic partition write with > part of the partition specified, just like the following example: > {code:java} > insert overwrite table test_privilege partition (`date` = '2023-09-27', hour) > ... {code} > hive will execute it directly without checking write privileges. > > use the following patch to fix this bug. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27744) privileges check is skipped when using partly dynamic partition write.
[ https://issues.apache.org/jira/browse/HIVE-27744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shuaiqi.guo updated HIVE-27744: --- Attachment: HIVE-27744.patch > privileges check is skipped when using partly dynamic partition write. > -- > > Key: HIVE-27744 > URL: https://issues.apache.org/jira/browse/HIVE-27744 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: All Versions >Reporter: shuaiqi.guo >Priority: Blocker > Fix For: 2.3.5 > > Attachments: HIVE-27744.patch > > > the privileges check will be skiped when using dynamic partition write with > part of the partition specified, just like the following example: > {code:java} > insert overwrite table test_privilege partition (`date` = '2023-09-27', hour) > ... {code} > hive will execute it directly without checking write privileges. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27744) privileges check is skipped when using partly dynamic partition write.
[ https://issues.apache.org/jira/browse/HIVE-27744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shuaiqi.guo updated HIVE-27744: --- Description: the privileges check will be skiped when using dynamic partition write with part of the partition specified, just like the following example: {code:java} insert overwrite table test_privilege partition (`date` = '2023-09-27', hour) ... {code} hive will execute it directly without checking write privileges. was: the privileges check will be skiped when using dynamic partition write with part of the partition specified. > privileges check is skipped when using partly dynamic partition write. > -- > > Key: HIVE-27744 > URL: https://issues.apache.org/jira/browse/HIVE-27744 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: All Versions >Reporter: shuaiqi.guo >Priority: Blocker > Fix For: 2.3.5 > > > the privileges check will be skiped when using dynamic partition write with > part of the partition specified, just like the following example: > {code:java} > insert overwrite table test_privilege partition (`date` = '2023-09-27', hour) > ... {code} > hive will execute it directly without checking write privileges. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27744) privileges check is skipped when using partly dynamic partition write.
shuaiqi.guo created HIVE-27744: -- Summary: privileges check is skipped when using partly dynamic partition write. Key: HIVE-27744 URL: https://issues.apache.org/jira/browse/HIVE-27744 Project: Hive Issue Type: Bug Components: Hive Affects Versions: All Versions Reporter: shuaiqi.guo Fix For: 2.3.5 the privileges check will be skiped when using dynamic partition write with part of the partition specified. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27727) Materialized view query rewrite fails if query has decimal derived aggregate
[ https://issues.apache.org/jira/browse/HIVE-27727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa resolved HIVE-27727. --- Resolution: Fixed Merged to master. Thanks [~zabetak] for review. > Materialized view query rewrite fails if query has decimal derived aggregate > > > Key: HIVE-27727 > URL: https://issues.apache.org/jira/browse/HIVE-27727 > Project: Hive > Issue Type: Bug >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: cbo, materializedviews, pull-request-available > > {code} > create table t1 (a int, b decimal(3,2)) stored as orc TBLPROPERTIES > ('transactional'='true'); > create materialized view mv1 as > select a, sum(b), count(b) from t1 group by a; > explain cbo > select a, avg(b) from t1 group by a; > {code} > MV is not used > {code} > CBO PLAN: > HiveProject(a=[$0], _o__c1=[CAST(/($1, $2)):DECIMAL(7, 6)]) > HiveAggregate(group=[{0}], agg#0=[sum($1)], agg#1=[count($1)]) > HiveTableScan(table=[[default, t1]], table:alias=[t1]) > {code} > If {{avg}} input is not decimal but for example {{int}} the query plan is > rewritten to use the MV -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27743) Semantic Search In Hive
Sreenath created HIVE-27743: --- Summary: Semantic Search In Hive Key: HIVE-27743 URL: https://issues.apache.org/jira/browse/HIVE-27743 Project: Hive Issue Type: Wish Environment: * Reporter: Sreenath Semantic search is a way for computers to understand the meaning behind words and phrases when you're searching for something. Instead of just looking for exact matches of keywords, it tries to figure out what you're really asking and provides results that are more relevant and meaningful to your question. It's like having a search engine that can understand what you mean, not just what you say, making it easier to find the information you're looking for. This ticket is a wish to have Semantic search in Hive. On the implementation side, semantic search uses an embedding model and any of the similarity distance functions. My proposal is to implement functions for on-the-fly calculation of similarity distance between two values. Once we have them we could easily do semantic search as part of a where clause. * Eg (using a cosine similarity function): “WHERE cos_dist(region, 'europe') > 0.9“. And it could return records with regions like Scandinavia, Nordic, Baltic etc… * We could have functions thats accept values as text or as vector embeddings. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27739) Multiple issues with timestamps with timezone - can lead to data inconsistency
[ https://issues.apache.org/jira/browse/HIVE-27739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Janos Kovacs updated HIVE-27739: Summary: Multiple issues with timestamps with timezone - can lead to data inconsistency (was: Multiple issues with timestamps with timezone) > Multiple issues with timestamps with timezone - can lead to data inconsistency > -- > > Key: HIVE-27739 > URL: https://issues.apache.org/jira/browse/HIVE-27739 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0-beta-1 >Reporter: Janos Kovacs >Assignee: Zoltán Rátkai >Priority: Major > > The following issues were found testing timestamps with timezonse: > * CREATE TABLE fails with SemanticException when hive.local.time.zone is set > to different valid value in the session > * Invalid timezone values (e.g. with typo) treated as UTC which can lead to > data consistency / loss issues > * LOCAL is an invalid timezone value and treated as UTC instead of treating > as system's timezone > The issues are tracked as sub-tasks. > in general, base tests are: > {noformat} > SELECT > '\${system:user.timezone}' as os, > '\${hiveconf:hive.local.time.zone}' as hive, > 'TZ' as branch, > tzas orig, > to_utc_timestamp(tz,'\${hiveconf:hive.local.time.zone}') as to_utc, > > from_utc_timestamp(to_utc_timestamp(tz,'\${hiveconf:hive.local.time.zone}'),'Europe/Budapest') > as to_bp, > > from_utc_timestamp(to_utc_timestamp(tz,'\${hiveconf:hive.local.time.zone}'),'America/Los_Angeles') > as to_la > FROM timestamptest; > " > {noformat} > > The results are: > {noformat} > +--+--+-+++++ > |os| hive | branch | orig >| to_utc | to_bp | to_la > | > +--+--+-+++++ > | Europe/Budapest | Europe/Budapest | TZ | 2016-01-03 21:26:34.0 > Europe/Budapest | 2016-01-03 20:26:34.0 | 2016-01-03 21:26:34.0 | > 2016-01-03 12:26:34.0 | > +--+--+-+++++ > | Europe/Budapest | UTC | TZ | 2016-01-03 20:26:34.0 UTC >| 2016-01-03 20:26:34.0 | 2016-01-03 21:26:34.0 | 2016-01-03 > 12:26:34.0 | > +--+--+-+++++ > | Europe/Budapest | LOCAL| TZ | 2016-01-03 21:26:34.0 > Europe/Budapest | 2016-01-03 21:26:34.0 | 2016-01-03 22:26:34.0 | > 2016-01-03 13:26:34.0 | !!! > +--+--+-+++++ > | UTC | Europe/Budapest | TZ | 2016-01-03 21:26:34.0 > Europe/Budapest | 2016-01-03 20:26:34.0 | 2016-01-03 21:26:34.0 | > 2016-01-03 12:26:34.0 | > +--+--+-+++++ > | UTC | UTC | TZ | 2016-01-03 20:26:34.0 UTC >| 2016-01-03 20:26:34.0 | 2016-01-03 21:26:34.0 | 2016-01-03 > 12:26:34.0 | > +--+--+-+++++ > | UTC | LOCAL| TZ | 2016-01-03 20:26:34.0 UTC >| 2016-01-03 20:26:34.0 | 2016-01-03 21:26:34.0 | 2016-01-03 > 12:26:34.0 | !!! > +--+--+-+++++ > {noformat} > The problematic cases: > * the "Europe/Budapest | LOCAL" case is wrong, LOCAL is treated as UTC > instead of system's TZ which makes 1h offset when converted > * the "UTC | LOCAL" case is only good because LOCAL is treated as UTC all > the time > Repro code and more details are in each of the subtask tickets -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27741) Invalid timezone value in to_utc_timestamp() is treated as UTC which can lead to data consistency issues
[ https://issues.apache.org/jira/browse/HIVE-27741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Janos Kovacs updated HIVE-27741: Description: When the timezone specified in the *to_utc_timestamp()* function is not valid, it still treated as UTC instead of throwing an error. If the user accidentally made a typo - e.g. America/Los{color:#ff}*t*{color}_Angeles, the query runs successfully returning an invalid converted value which can lead to data consistency issues. Repro code: {noformat} docker rm -f hive4 export HIVE_VERSION=4.0.0-beta-2-SNAPSHOT export HS2_ENV_TZ="Europe/Budapest" export HS2_USER_TZ=${HS2_ENV_TZ} export HIVE_LOCAL_TZ="America/Los_Angeles" export HS2_OPTS="-Duser.timezone=$HS2_USER_TZ -Dhive.local.time.zone=$HIVE_LOCAL_TZ" export HS2_OPTS="$HS2_OPTS -Dhive.server2.tez.initialize.default.sessions=false" docker run -d -p 1:1 -p 10001:10001 -p 10002:10002 --env TZ=${HS2_ENV_TZ} --env SERVICE_OPTS=${HS2_OPTS} --env SERVICE_NAME=hiveserver2 --name hive4 apache/hive:${HIVE_VERSION} docker exec -it hive4 beeline -u 'jdbc:hive2://localhost:1/' -e " SELECT '\${env:TZ}' as \`env:TZ\`, '\${system:user.timezone}' as \`system:user.timezone\`, '\${hiveconf:hive.local.time.zone}' as \`hiveconf:hive.local.time.zone\`; DROP TABLE IF EXISTS timestamptest; CREATE TABLE timestamptest ( ts timestamp, tz timestamp with local time zone ) STORED AS TEXTFILE; INSERT INTO timestamptest select TIMESTAMP'2016-01-03 12:26:34',TIMESTAMPLOCALTZ'2016-01-03 12:26:34 America/Los_Angeles'; SELECT tzas orig, to_utc_timestamp(tz, 'America/Los_Angeles') as utc_correct_tz, to_utc_timestamp(tz, 'Europe/HereIsATypo')as utc_incorrect_tz, to_utc_timestamp(tz, 'LOCAL') as utc_local_aslo_incorrect_tz, to_utc_timestamp(tz, 'UTC') as utc_tz FROM timestamptest; " {noformat} The results are: {noformat} +--+---++ | env:tz | system:user.timezone | hiveconf:hive.local.time.zone | +--+---++ | Europe/Budapest | Europe/Budapest | America/Los_Angeles| +--+---++ ++++--++ |orig| utc_correct_tz | utc_incorrect_tz| utc_local_aslo_incorrect_tz | utc_tz | ++++--++ | 2016-01-03 12:26:34.0 America/Los_Angeles | 2016-01-03 20:26:34.0 | 2016-01-03 12:26:34.0 | 2016-01-03 12:26:34.0| 2016-01-03 12:26:34.0 | ++++--++ {noformat} Note: * the invalid timezone - utc_incorrect_tz - is treated as UTC * also note that LOCAL is also treated as UTC which in fact should be treated as system's timezone, but as LOCAL is also an invalid timezone value in hive4, ut becomes UTC just like any other invalid and/or typo timezone values (see HIVE-27742) Hive should throw an Exception in that case to let the user know that the provided timezone is wrong - at least this should be configurable, e.g. via something like {*}hive.strict.time.zone.check{*}. was: When the timezone specified in the *to_utc_timestamp()* function is not valid, it still treated as UTC instead of throwing an error. If the user accidentally made a typo - e.g. America/Los{color:#ff}*t*{color}_Angeles, the query runs successfully returning an invalid converted value which can lead to data consistency issues. Repro code: {noformat} docker rm -f hive4 export HIVE_VERSION=4.0.0-beta-2-SNAPSHOT export HS2_ENV_TZ="Europe/Budapest" export HS2_USER_TZ=${HS2_ENV_TZ} export HIVE_LOCAL_TZ="America/Los_Angeles" export HS2_OPTS="-Duser.timezone=$HS2_USER_TZ -Dhive.local.time.zone=$HIVE_LOCAL_TZ" export HS2_OPTS="$HS2_OPTS -Dhive.server2.tez.initialize.default.sessions=false" docker run -d -p 1:1 -p 10001:10001 -p 10002:10002 --env TZ=${HS2_ENV_TZ} --env SERVICE_OPTS=${HS2_OPTS} --env SERVICE_NAME=hiveserver2 --name hive4 apache/hive:${HIVE_VERSION} docker exec -it hive4 beeline -u 'jdbc:hive2://localhost:1/' -e " SELECT '\${env:TZ}' as \`env:TZ\`, '\${system:user.timezone}' as \`system:user.timezone\`, '\${hiveconf:hive.local.time.zone}' as \`hiveconf:hive.local.time.zone\`; DROP TABLE IF EXISTS timestamptest; CREATE TABLE timestamptest ( ts timestamp, tz timestamp with local time zone ) STORED AS TEXTFIL
[jira] [Created] (HIVE-27742) LOCAL timezone value is treated as UTC instead of system's timezone which causes data consistency issues
Janos Kovacs created HIVE-27742: --- Summary: LOCAL timezone value is treated as UTC instead of system's timezone which causes data consistency issues Key: HIVE-27742 URL: https://issues.apache.org/jira/browse/HIVE-27742 Project: Hive Issue Type: Sub-task Affects Versions: 4.0.0-beta-1 Reporter: Janos Kovacs Assignee: Zoltán Rátkai The Hive configuration states: {noformat} HIVE_LOCAL_TIME_ZONE("hive.local.time.zone", "LOCAL", "Sets the time-zone for displaying and interpreting time stamps. If this property value is set to\n" + "LOCAL, it is not specified, or it is not a correct time-zone, the system default time-zone will be\n " + "used instead. Time-zone IDs can be specified as region-based zone IDs (based on IANA time-zone data),\n" + "abbreviated zone IDs, or offset IDs."), {noformat} But seems like in hive4 (-beta) it always treated as UTC - as any other invalid timezone value (see HIVE-27741). Repro code: {noformat} docker rm -f hive4 export HIVE_VERSION=4.0.0-beta-2-SNAPSHOT export HS2_ENV_TZ="Europe/Budapest" export HS2_USER_TZ=${HS2_ENV_TZ} export HIVE_LOCAL_TZ=${HS2_ENV_TZ} export HS2_OPTS="-Duser.timezone=$HS2_USER_TZ -Dhive.local.time.zone=$HIVE_LOCAL_TZ" export HS2_OPTS="$HS2_OPTS -Dhive.server2.tez.initialize.default.sessions=false" docker run -d -p 1:1 -p 10001:10001 -p 10002:10002 --env TZ=${HS2_ENV_TZ} --env SERVICE_OPTS=${HS2_OPTS} --env SERVICE_NAME=hiveserver2 --name hive4 apache/hive:${HIVE_VERSION} docker exec -it hive4 beeline -u 'jdbc:hive2://localhost:1/' -e " SELECT '\${env:TZ}' as \`env:TZ\`, '\${system:user.timezone}' as \`system:user.timezone\`, '\${hiveconf:hive.local.time.zone}' as \`hiveconf:hive.local.time.zone\`; DROP TABLE IF EXISTS timestamptest; CREATE TABLE timestamptest ( ts timestamp, tz timestamp with local time zone ) STORED AS TEXTFILE; INSERT INTO timestamptest select TIMESTAMP'2016-01-03 12:26:34',TIMESTAMPLOCALTZ'2016-01-03 12:26:34 America/Los_Angeles'; SET hive.query.results.cache.enabled=false; SET hive.local.time.zone=LOCAL; SELECT '\${env:TZ}' as \`env:TZ\`, '\${system:user.timezone}' as \`system:user.timezone\`, '\${hiveconf:hive.local.time.zone}' as \`hiveconf:hive.local.time.zone\`; SELECT 'LOCAL' as tzset, tz as orig, to_utc_timestamp(tz, 'LOCAL') as utc_local, to_utc_timestamp(tz, 'Europe/Budapest') as utc_tz, from_utc_timestamp(to_utc_timestamp(tz,'LOCAL'),'Europe/Budapest') as to_bp FROM timestamptest; SET hive.local.time.zone=Europe/Budapest; SELECT '\${env:TZ}' as \`env:TZ\`, '\${system:user.timezone}' as \`system:user.timezone\`, '\${hiveconf:hive.local.time.zone}' as \`hiveconf:hive.local.time.zone\`; SELECT 'Europe/Budapest' as tzset, tz as orig, to_utc_timestamp(tz, 'LOCAL') as utc_local, to_utc_timestamp(tz, 'Europe/Budapest') as utc_tz, from_utc_timestamp(to_utc_timestamp(tz,'Europe/Budapest'),'Europe/Budapest') as to_bp FROM timestamptest; " {noformat} The results are: {noformat} +--+---++ | env:tz | system:user.timezone | hiveconf:hive.local.time.zone | +--+---++ | Europe/Budapest | Europe/Budapest | LOCAL | +--+---++ ++++++ | tzset | orig | utc_local| utc_tz | to_bp | ++++++ | LOCAL | 2016-01-03 21:26:34.0 Europe/Budapest | 2016-01-03 21:26:34.0 | 2016-01-03 20:26:34.0 | 2016-01-03 22:26:34.0 | ++++++ +--+---++ | env:tz | system:user.timezone | hiveconf:hive.local.time.zone | +--+---++ | Europe/Budapest | Europe/Budapest | Europe/Budapest| +--+---+
[jira] [Created] (HIVE-27741) Invalid timezone value in to_utc_timestamp() is treated as UTC which can lead to data consistency issues
Janos Kovacs created HIVE-27741: --- Summary: Invalid timezone value in to_utc_timestamp() is treated as UTC which can lead to data consistency issues Key: HIVE-27741 URL: https://issues.apache.org/jira/browse/HIVE-27741 Project: Hive Issue Type: Sub-task Affects Versions: 4.0.0-beta-1 Reporter: Janos Kovacs Assignee: Zoltán Rátkai When the timezone specified in the *to_utc_timestamp()* function is not valid, it still treated as UTC instead of throwing an error. If the user accidentally made a typo - e.g. America/Los{color:#ff}*t*{color}_Angeles, the query runs successfully returning an invalid converted value which can lead to data consistency issues. Repro code: {noformat} docker rm -f hive4 export HIVE_VERSION=4.0.0-beta-2-SNAPSHOT export HS2_ENV_TZ="Europe/Budapest" export HS2_USER_TZ=${HS2_ENV_TZ} export HIVE_LOCAL_TZ="America/Los_Angeles" export HS2_OPTS="-Duser.timezone=$HS2_USER_TZ -Dhive.local.time.zone=$HIVE_LOCAL_TZ" export HS2_OPTS="$HS2_OPTS -Dhive.server2.tez.initialize.default.sessions=false" docker run -d -p 1:1 -p 10001:10001 -p 10002:10002 --env TZ=${HS2_ENV_TZ} --env SERVICE_OPTS=${HS2_OPTS} --env SERVICE_NAME=hiveserver2 --name hive4 apache/hive:${HIVE_VERSION} docker exec -it hive4 beeline -u 'jdbc:hive2://localhost:1/' -e " SELECT '\${env:TZ}' as \`env:TZ\`, '\${system:user.timezone}' as \`system:user.timezone\`, '\${hiveconf:hive.local.time.zone}' as \`hiveconf:hive.local.time.zone\`; DROP TABLE IF EXISTS timestamptest; CREATE TABLE timestamptest ( ts timestamp, tz timestamp with local time zone ) STORED AS TEXTFILE; INSERT INTO timestamptest select TIMESTAMP'2016-01-03 12:26:34',TIMESTAMPLOCALTZ'2016-01-03 12:26:34 America/Los_Angeles'; SELECT tzas orig, to_utc_timestamp(tz, 'America/Los_Angeles') as utc_correct_tz, to_utc_timestamp(tz, 'Europe/HereIsATypo')as utc_incorrect_tz, to_utc_timestamp(tz, 'LOCAL') as utc_local_aslo_incorrect_tz, to_utc_timestamp(tz, 'UTC') as utc_tz FROM timestamptest; " {noformat} The results are: {noformat} +--+---++ | env:tz | system:user.timezone | hiveconf:hive.local.time.zone | +--+---++ | Europe/Budapest | Europe/Budapest | America/Los_Angeles| +--+---++ ++++--++ |orig| utc_correct_tz | utc_incorrect_tz| utc_local_aslo_incorrect_tz | utc_tz | ++++--++ | 2016-01-03 12:26:34.0 America/Los_Angeles | 2016-01-03 20:26:34.0 | 2016-01-03 12:26:34.0 | 2016-01-03 12:26:34.0| 2016-01-03 12:26:34.0 | ++++--++ {noformat} Note: * the invalid timezone - utc_incorrect_tz - is treated as UTC * also note that LOCAL is also treated as UTC which in fact should be treated as system's timezone, but as LOCAL is also an invalid timezone value in hive4, ut becomes UTC just like any other invalid and/or typo timezone values Hive should throw an Exception in that case to let the user know that the provided timezone is wrong - at least this should be configurable, e.g. via something like {*}hive.strict.time.zone.check{*}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27740) CREATE TABLE with timestamp with timezone fails with SemanticException
Janos Kovacs created HIVE-27740: --- Summary: CREATE TABLE with timestamp with timezone fails with SemanticException Key: HIVE-27740 URL: https://issues.apache.org/jira/browse/HIVE-27740 Project: Hive Issue Type: Sub-task Affects Versions: 4.0.0-beta-1 Reporter: Janos Kovacs Assignee: Zoltán Rátkai CREATE TABLE with timestamp with timezone fails with SemanticException when timezone is changed in a session to another valid value. Repro code: {noformat} docker rm -f hive4 export HIVE_VERSION=4.0.0-beta-2-SNAPSHOT export HS2_ENV_TZ="Europe/Budapest" export HS2_USER_TZ=${HS2_ENV_TZ} export HIVE_LOCAL_TZ=${HS2_ENV_TZ} export HS2_OPTS="-Duser.timezone=$HS2_USER_TZ -Dhive.local.time.zone=$HIVE_LOCAL_TZ" export HS2_OPTS="$HS2_OPTS -Dhive.server2.tez.initialize.default.sessions=false" docker run -d -p 1:1 -p 10001:10001 -p 10002:10002 --env TZ=${HS2_ENV_TZ} --env SERVICE_OPTS=${HS2_OPTS} --env SERVICE_NAME=hiveserver2 --name hive4 apache/hive:${HIVE_VERSION} docker exec -it hive4 beeline -u 'jdbc:hive2://localhost:1/' -e " SELECT '\${env:TZ}' as \`env:TZ\`, '\${system:user.timezone}' as \`system:user.timezone\`, '\${hiveconf:hive.local.time.zone}' as \`hiveconf:hive.local.time.zone\`; DROP TABLE IF EXISTS timestamptest; CREATE TABLE timestamptest ( ts timestamp, tz timestamp with local time zone DEFAULT TIMESTAMPLOCALTZ'2016-01-03 12:26:34 America/Los_Angeles' ) STORED AS TEXTFILE; INSERT INTO timestamptest (ts) VALUES (TIMESTAMP'2016-01-03 20:26:34'); SELECT ts, tz from timestamptest; SET hive.local.time.zone=Europe/Berlin; SELECT '\${env:TZ}' as \`env:TZ\`, '\${system:user.timezone}' as \`system:user.timezone\`, '\${hiveconf:hive.local.time.zone}' as \`hiveconf:hive.local.time.zone\`; SELECT ts, tz from timestamptest; DROP TABLE IF EXISTS timestamptest; CREATE TABLE timestamptest ( ts timestamp, tz timestamp with local time zone DEFAULT TIMESTAMPLOCALTZ'2016-01-03 12:26:34 America/Los_Angeles' ) STORED AS TEXTFILE; " {noformat} Querying the data works with both timezone values: {noformat} +--+---++ | env:tz | system:user.timezone | hiveconf:hive.local.time.zone | +--+---++ | Europe/Budapest | Europe/Budapest | Europe/Budapest| +--+---++ +++ | ts | tz | +++ | 2016-01-03 20:26:34.0 | 2016-01-03 21:26:34.0 Europe/Budapest | +++ +--+---++ | env:tz | system:user.timezone | hiveconf:hive.local.time.zone | +--+---++ | Europe/Budapest | Europe/Budapest | Europe/Berlin | +--+---++ ++--+ | ts | tz | ++--+ | 2016-01-03 20:26:34.0 | 2016-01-03 21:26:34.0 Europe/Berlin | ++--+ {noformat} CREATE also work with the system set timezone value but fails when changed. The second CREATE TABLE statement in the above example fails with: {noformat} Error: Error while compiling statement: FAILED: SemanticException [Error 10326]: Invalid Constraint syntax Invalid type: timestamp with local time zone for default value: TIMESTAMPLOCALTZ'2016-01-03 12:26:34 America/Los_Angeles'. Please make sure that the type is compatible with column type: timestamp with local time zone (state=42000,code=10326) {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27739) Multiple issues with timestamps with timezone
Janos Kovacs created HIVE-27739: --- Summary: Multiple issues with timestamps with timezone Key: HIVE-27739 URL: https://issues.apache.org/jira/browse/HIVE-27739 Project: Hive Issue Type: Bug Affects Versions: 4.0.0-beta-1 Reporter: Janos Kovacs Assignee: Zoltán Rátkai The following issues were found testing timestamps with timezonse: * CREATE TABLE fails with SemanticException when hive.local.time.zone is set to different valid value in the session * Invalid timezone values (e.g. with typo) treated as UTC which can lead to data consistency / loss issues * LOCAL is an invalid timezone value and treated as UTC instead of treating as system's timezone The issues are tracked as sub-tasks. in general, base tests are: {noformat} SELECT '\${system:user.timezone}' as os, '\${hiveconf:hive.local.time.zone}' as hive, 'TZ' as branch, tzas orig, to_utc_timestamp(tz,'\${hiveconf:hive.local.time.zone}') as to_utc, from_utc_timestamp(to_utc_timestamp(tz,'\${hiveconf:hive.local.time.zone}'),'Europe/Budapest') as to_bp, from_utc_timestamp(to_utc_timestamp(tz,'\${hiveconf:hive.local.time.zone}'),'America/Los_Angeles') as to_la FROM timestamptest; " {noformat} The results are: {noformat} +--+--+-+++++ |os| hive | branch | orig | to_utc | to_bp | to_la | +--+--+-+++++ | Europe/Budapest | Europe/Budapest | TZ | 2016-01-03 21:26:34.0 Europe/Budapest | 2016-01-03 20:26:34.0 | 2016-01-03 21:26:34.0 | 2016-01-03 12:26:34.0 | +--+--+-+++++ | Europe/Budapest | UTC | TZ | 2016-01-03 20:26:34.0 UTC | 2016-01-03 20:26:34.0 | 2016-01-03 21:26:34.0 | 2016-01-03 12:26:34.0 | +--+--+-+++++ | Europe/Budapest | LOCAL| TZ | 2016-01-03 21:26:34.0 Europe/Budapest | 2016-01-03 21:26:34.0 | 2016-01-03 22:26:34.0 | 2016-01-03 13:26:34.0 | !!! +--+--+-+++++ | UTC | Europe/Budapest | TZ | 2016-01-03 21:26:34.0 Europe/Budapest | 2016-01-03 20:26:34.0 | 2016-01-03 21:26:34.0 | 2016-01-03 12:26:34.0 | +--+--+-+++++ | UTC | UTC | TZ | 2016-01-03 20:26:34.0 UTC | 2016-01-03 20:26:34.0 | 2016-01-03 21:26:34.0 | 2016-01-03 12:26:34.0 | +--+--+-+++++ | UTC | LOCAL| TZ | 2016-01-03 20:26:34.0 UTC | 2016-01-03 20:26:34.0 | 2016-01-03 21:26:34.0 | 2016-01-03 12:26:34.0 | !!! +--+--+-+++++ {noformat} The problematic cases: * the "Europe/Budapest | LOCAL" case is wrong, LOCAL is treated as UTC instead of system's TZ which makes 1h offset when converted * the "UTC | LOCAL" case is only good because LOCAL is treated as UTC all the time Repro code and more details are in each of the subtask tickets -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27738) Fix Schematool version so that it can pickup correct schema script file after 4.0.0-beta-1 release
[ https://issues.apache.org/jira/browse/HIVE-27738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27738: -- Labels: pull-request-available (was: ) > Fix Schematool version so that it can pickup correct schema script file after > 4.0.0-beta-1 release > -- > > Key: HIVE-27738 > URL: https://issues.apache.org/jira/browse/HIVE-27738 > Project: Hive > Issue Type: Bug >Reporter: KIRTI RUGE >Assignee: KIRTI RUGE >Priority: Major > Labels: pull-request-available > > hive.version.shortname needs to be fixed from / pom.xml and > standalone-metastore/pom.xml so that it should pick up xxx4.0.0-beta-2.xx.sql > file correctly -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27738) Fix Schematool version so that it can pickup correct schema script file after 4.0.0-beta-1 release
[ https://issues.apache.org/jira/browse/HIVE-27738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KIRTI RUGE reassigned HIVE-27738: - Assignee: KIRTI RUGE > Fix Schematool version so that it can pickup correct schema script file after > 4.0.0-beta-1 release > -- > > Key: HIVE-27738 > URL: https://issues.apache.org/jira/browse/HIVE-27738 > Project: Hive > Issue Type: Bug >Reporter: KIRTI RUGE >Assignee: KIRTI RUGE >Priority: Major > > hive.version.shortname needs to be fixed from / pom.xml and > standalone-metastore/pom.xml so that it should pick up xxx4.0.0-beta-2.xx.sql > file correctly -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27738) Fix Schematool version so that it can pickup correct schema script file after 4.0.0-beta-1 release
KIRTI RUGE created HIVE-27738: - Summary: Fix Schematool version so that it can pickup correct schema script file after 4.0.0-beta-1 release Key: HIVE-27738 URL: https://issues.apache.org/jira/browse/HIVE-27738 Project: Hive Issue Type: Bug Reporter: KIRTI RUGE hive.version.shortname needs to be fixed from / pom.xml and standalone-metastore/pom.xml so that it should pick up xxx4.0.0-beta-2.xx.sql file correctly -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27733) Intermittent ConcurrentModificationException in HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-27733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27733: -- Labels: pull-request-available (was: ) > Intermittent ConcurrentModificationException in HiveServer2 > --- > > Key: HIVE-27733 > URL: https://issues.apache.org/jira/browse/HIVE-27733 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0-beta-1 >Reporter: Henri Biestro >Assignee: Henri Biestro >Priority: Major > Labels: pull-request-available > > Some tests sporadically fail with a cause that looks like: > {code} > Caused by: java.util.ConcurrentModificationException > at java.util.HashMap$HashIterator.nextNode(HashMap.java:1493) ~[?:?] > at java.util.HashMap$EntryIterator.next(HashMap.java:1526) ~[?:?] > at java.util.HashMap$EntryIterator.next(HashMap.java:1524) ~[?:?] > at java.util.AbstractCollection.toArray(AbstractCollection.java:200) > ~[?:?] > at com.google.common.collect.Iterables.toArray(Iterables.java:285) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:451) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:436) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at > org.apache.hadoop.hive.ql.log.PerfLogger.getEndTimes(PerfLogger.java:227) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:629) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:560) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:554) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:200) > ~[hive-service-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > ... 51 more > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-27733) Intermittent ConcurrentModificationException in HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-27733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769153#comment-17769153 ] Henri Biestro edited comment on HIVE-27733 at 9/26/23 12:28 PM: Indeed, it is. But the code still encounters the race condition due to sharing the PerfLogger between threads. The problem still potentially occurs. was (Author: henrib): Indeed, it is. But the code still encounters the race condition due to sharing the PerfLogger between threads. > Intermittent ConcurrentModificationException in HiveServer2 > --- > > Key: HIVE-27733 > URL: https://issues.apache.org/jira/browse/HIVE-27733 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0-beta-1 >Reporter: Henri Biestro >Assignee: Henri Biestro >Priority: Major > > Some tests sporadically fail with a cause that looks like: > {code} > Caused by: java.util.ConcurrentModificationException > at java.util.HashMap$HashIterator.nextNode(HashMap.java:1493) ~[?:?] > at java.util.HashMap$EntryIterator.next(HashMap.java:1526) ~[?:?] > at java.util.HashMap$EntryIterator.next(HashMap.java:1524) ~[?:?] > at java.util.AbstractCollection.toArray(AbstractCollection.java:200) > ~[?:?] > at com.google.common.collect.Iterables.toArray(Iterables.java:285) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:451) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:436) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at > org.apache.hadoop.hive.ql.log.PerfLogger.getEndTimes(PerfLogger.java:227) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:629) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:560) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:554) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:200) > ~[hive-service-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > ... 51 more > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27733) Intermittent ConcurrentModificationException in HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-27733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769153#comment-17769153 ] Henri Biestro commented on HIVE-27733: -- Indeed, it is. But the code still encounters the race condition due to sharing the PerfLogger between threads. > Intermittent ConcurrentModificationException in HiveServer2 > --- > > Key: HIVE-27733 > URL: https://issues.apache.org/jira/browse/HIVE-27733 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0-beta-1 >Reporter: Henri Biestro >Assignee: Henri Biestro >Priority: Major > > Some tests sporadically fail with a cause that looks like: > {code} > Caused by: java.util.ConcurrentModificationException > at java.util.HashMap$HashIterator.nextNode(HashMap.java:1493) ~[?:?] > at java.util.HashMap$EntryIterator.next(HashMap.java:1526) ~[?:?] > at java.util.HashMap$EntryIterator.next(HashMap.java:1524) ~[?:?] > at java.util.AbstractCollection.toArray(AbstractCollection.java:200) > ~[?:?] > at com.google.common.collect.Iterables.toArray(Iterables.java:285) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:451) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:436) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at > org.apache.hadoop.hive.ql.log.PerfLogger.getEndTimes(PerfLogger.java:227) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:629) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:560) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:554) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:200) > ~[hive-service-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > ... 51 more > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27737) Consider extending HIVE-17574 to aux jars
[ https://issues.apache.org/jira/browse/HIVE-27737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-27737: Description: [HIVE-17574|https://github.com/apache/hive/commit/26753ade2a130339940119c950b9c9af53e3d024] was about an optimization, where HDFS-based resources optionally were localized directly from the "original" hdfs folder instead of a tez session dir. This reduced the HDFS overhead, by introducing hive.resource.use.hdfs.location, so there are 2 cases: 1. hive.resource.use.hdfs.location=true a) collect "HDFS temp files" and optimize their access: added files, added jars b) collect local temp files and use the non-optimized session-based approach: added files, added jars, aux jars, reloadable aux jars {code} // reference HDFS based resource directly, to use distribute cache efficiently. addHdfsResource(conf, tmpResources, LocalResourceType.FILE, getHdfsTempFilesFromConf(conf)); // local resources are session based. tmpResources.addAll( addTempResources(conf, hdfsDirPathStr, LocalResourceType.FILE, getLocalTempFilesFromConf(conf), null).values() ); {code} 2. hive.resource.use.hdfs.location=false a) original behavior: collect all jars in hs2's scope (added files, added jars, aux jars, reloadable aux jars) and put it to a session based directory {code} // all resources including HDFS are session based. tmpResources.addAll( addTempResources(conf, hdfsDirPathStr, LocalResourceType.FILE, getTempFilesFromConf(conf), null).values() ); {code} my proposal is related to 1) let's say user is about to load an aux jar from hdfs and have it set in hive.aux.jars.path: {code} hive.aux.jars.path=file:///opt/some_local_jar.jar,hdfs:///tmp/some_distributed.jar {code} in this case: we can distinguish between file:// scheme resources and hdfs:// scheme resources: - file scheme resources should fall into 1b), still be used from session dir - hdfs scheme resources should fall into 1a), simply used by addHdfsResource this needs a bit of attention at every usages of aux jars, because aux jars are e.g. supposed to be classloaded to HS2 sessions, so in case of an hdfs resource, it should be taken care of was: [HIVE-17574|https://github.com/apache/hive/commit/26753ade2a130339940119c950b9c9af53e3d024] was about an optimization, where HDFS-based resources optionally were localized directly from the "original" hdfs folder instead of a tez session dir. This reduced the HDFS overhead, by introducing hive.resource.use.hdfs.location, so there are 2 cases: 1. hive.resource.use.hdfs.location=true a) collect "HDFS temp files" and optimize their access: added files, added jars b) collect local temp files and use the non-optimized session-based approach: added files, added jars, aux jars, reloadable aux jars {code} // reference HDFS based resource directly, to use distribute cache efficiently. addHdfsResource(conf, tmpResources, LocalResourceType.FILE, getHdfsTempFilesFromConf(conf)); // local resources are session based. tmpResources.addAll( addTempResources(conf, hdfsDirPathStr, LocalResourceType.FILE, getLocalTempFilesFromConf(conf), null).values() ); {code} 2. hive.resource.use.hdfs.location=false a) original behavior: collect all jars in hs2's scope (added files, added jars, aux jars, reloadable aux jars) and put it to a session based directory {code} // all resources including HDFS are session based. tmpResources.addAll( addTempResources(conf, hdfsDirPathStr, LocalResourceType.FILE, getTempFilesFromConf(conf), null).values() ); {code} my proposal is related to 1) let's say user is about to load an aux jar from hdfs and have it set in hive.aux.jars.path: {code} hive.aux.jars.path=file:///opt/some_local_jar.jar,hdfs:///tmp/some_distributed.jar {code} in this case: we can distinguish between file:// scheme resources and hdfs:// scheme resources: - file scheme resources should fall into 1b), still be used from session dir - hdfs scheme resources should fall into 1a), simply used by addHdfsResource > Consider extending HIVE-17574 to aux jars > - > > Key: HIVE-27737 > URL: https://issues.apache.org/jira/browse/HIVE-27737 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Priority: Major > > [HIVE-17574|https://github.com/apache/hive/commit/26753ade2a130339940119c950b9c9af53e3d024] > was about an optimization, where HDFS-based resources optionally were > localized directly from the "original" hdfs folder instead of a tez session > dir. This reduced the HDFS overhead, by introducing > hive.resource.use.hdfs.location, so there are 2 cases: > 1. hive.resource.use.hdfs.location=true > a)
[jira] [Updated] (HIVE-27737) Consider extending HIVE-17574 to aux jars
[ https://issues.apache.org/jira/browse/HIVE-27737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-27737: Description: [HIVE-17574|https://github.com/apache/hive/commit/26753ade2a130339940119c950b9c9af53e3d024] was about an optimization, where HDFS-based resources optionally were localized directly from the "original" hdfs folder instead of a tez session dir. This reduced the HDFS overhead, by introducing hive.resource.use.hdfs.location, so there are 2 cases: 1. hive.resource.use.hdfs.location=true a) collect "HDFS temp files" and optimize their access: added files, added jars b) collect local temp files and use the non-optimized session-based approach: added files, added jars, aux jars, reloadable aux jars {code} // reference HDFS based resource directly, to use distribute cache efficiently. addHdfsResource(conf, tmpResources, LocalResourceType.FILE, getHdfsTempFilesFromConf(conf)); // local resources are session based. tmpResources.addAll( addTempResources(conf, hdfsDirPathStr, LocalResourceType.FILE, getLocalTempFilesFromConf(conf), null).values() ); {code} 2. hive.resource.use.hdfs.location=false a) original behavior: collect all jars in hs2's scope (added files, added jars, aux jars, reloadable aux jars) and put it to a session based directory {code} // all resources including HDFS are session based. tmpResources.addAll( addTempResources(conf, hdfsDirPathStr, LocalResourceType.FILE, getTempFilesFromConf(conf), null).values() ); {code} my proposal is related to 1) let's say user is about to load an aux jar from hdfs and have it set in hive.aux.jars.path: {code} hive.aux.jars.path=file:///opt/some_local_jar.jar,hdfs:///tmp/some_distributed.jar {code} in this case: we can distinguish between file:// scheme resources and hdfs:// scheme resources: - file scheme resources should fall into 1b), still be used from session dir - hdfs scheme resources should fall into 1a), simply used by addHdfsResource > Consider extending HIVE-17574 to aux jars > - > > Key: HIVE-27737 > URL: https://issues.apache.org/jira/browse/HIVE-27737 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Priority: Major > > [HIVE-17574|https://github.com/apache/hive/commit/26753ade2a130339940119c950b9c9af53e3d024] > was about an optimization, where HDFS-based resources optionally were > localized directly from the "original" hdfs folder instead of a tez session > dir. This reduced the HDFS overhead, by introducing > hive.resource.use.hdfs.location, so there are 2 cases: > 1. hive.resource.use.hdfs.location=true > a) collect "HDFS temp files" and optimize their access: added files, added > jars > b) collect local temp files and use the non-optimized session-based approach: > added files, added jars, aux jars, reloadable aux jars > {code} > // reference HDFS based resource directly, to use distribute cache > efficiently. > addHdfsResource(conf, tmpResources, LocalResourceType.FILE, > getHdfsTempFilesFromConf(conf)); > // local resources are session based. > tmpResources.addAll( > addTempResources(conf, hdfsDirPathStr, LocalResourceType.FILE, > getLocalTempFilesFromConf(conf), null).values() > ); > {code} > 2. hive.resource.use.hdfs.location=false > a) original behavior: collect all jars in hs2's scope (added files, added > jars, aux jars, reloadable aux jars) and put it to a session based directory > {code} > // all resources including HDFS are session based. > tmpResources.addAll( > addTempResources(conf, hdfsDirPathStr, LocalResourceType.FILE, > getTempFilesFromConf(conf), null).values() > ); > {code} > my proposal is related to 1) > let's say user is about to load an aux jar from hdfs and have it set in > hive.aux.jars.path: > {code} > hive.aux.jars.path=file:///opt/some_local_jar.jar,hdfs:///tmp/some_distributed.jar > {code} > in this case: we can distinguish between file:// scheme resources and hdfs:// > scheme resources: > - file scheme resources should fall into 1b), still be used from session dir > - hdfs scheme resources should fall into 1a), simply used by addHdfsResource > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27737) Extending HIVE-17574 to aux jars
László Bodor created HIVE-27737: --- Summary: Extending HIVE-17574 to aux jars Key: HIVE-27737 URL: https://issues.apache.org/jira/browse/HIVE-27737 Project: Hive Issue Type: Improvement Reporter: László Bodor -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27737) Consider extending HIVE-17574 to aux jars
[ https://issues.apache.org/jira/browse/HIVE-27737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-27737: Summary: Consider extending HIVE-17574 to aux jars (was: Extending HIVE-17574 to aux jars) > Consider extending HIVE-17574 to aux jars > - > > Key: HIVE-27737 > URL: https://issues.apache.org/jira/browse/HIVE-27737 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27722) Added org.bouncycastle as dependency and excluded the dependencies of the same which lead to version mismatch while running tests
[ https://issues.apache.org/jira/browse/HIVE-27722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan resolved HIVE-27722. - Fix Version/s: 3.2.0 Resolution: Fixed > Added org.bouncycastle as dependency and excluded the dependencies of the > same which lead to version mismatch while running tests > - > > Key: HIVE-27722 > URL: https://issues.apache.org/jira/browse/HIVE-27722 > Project: Hive > Issue Type: Sub-task >Affects Versions: 3.2.0 >Reporter: Aman Raj >Assignee: Aman Raj >Priority: Major > Labels: pull-request-available > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27704) Remove PowerMock from jdbc-handler and upgrade mockito to 4.11
[ https://issues.apache.org/jira/browse/HIVE-27704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena reassigned HIVE-27704: --- Assignee: KIRTI RUGE > Remove PowerMock from jdbc-handler and upgrade mockito to 4.11 > -- > > Key: HIVE-27704 > URL: https://issues.apache.org/jira/browse/HIVE-27704 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Zsolt Miskolczi >Assignee: KIRTI RUGE >Priority: Major > Labels: newbie, pull-request-available, starter > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27736) Remove PowerMock from itests-jmh and upgrade mockito
Ayush Saxena created HIVE-27736: --- Summary: Remove PowerMock from itests-jmh and upgrade mockito Key: HIVE-27736 URL: https://issues.apache.org/jira/browse/HIVE-27736 Project: Hive Issue Type: Sub-task Reporter: Ayush Saxena Assignee: Zsolt Miskolczi Remove power mock from itests-jmh module -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27701) Remove PowerMock from llap-client and upgrade mockito to 4.11
[ https://issues.apache.org/jira/browse/HIVE-27701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HIVE-27701: Parent: HIVE-27735 Issue Type: Sub-task (was: Task) > Remove PowerMock from llap-client and upgrade mockito to 4.11 > - > > Key: HIVE-27701 > URL: https://issues.apache.org/jira/browse/HIVE-27701 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Zsolt Miskolczi >Priority: Major > Labels: newbie, starter > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27704) Remove PowerMock from jdbc-handler and upgrade mockito to 4.11
[ https://issues.apache.org/jira/browse/HIVE-27704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HIVE-27704: Parent: HIVE-27735 Issue Type: Sub-task (was: Task) > Remove PowerMock from jdbc-handler and upgrade mockito to 4.11 > -- > > Key: HIVE-27704 > URL: https://issues.apache.org/jira/browse/HIVE-27704 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Zsolt Miskolczi >Priority: Major > Labels: newbie, pull-request-available, starter > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27705) Remove PowerMock from service (hive-service) and upgrade mockito to 4.11
[ https://issues.apache.org/jira/browse/HIVE-27705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HIVE-27705: Parent: HIVE-27735 Issue Type: Sub-task (was: Task) > Remove PowerMock from service (hive-service) and upgrade mockito to 4.11 > > > Key: HIVE-27705 > URL: https://issues.apache.org/jira/browse/HIVE-27705 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Zsolt Miskolczi >Priority: Major > Labels: newbie, starter > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27702) Remove PowerMock from beeline and upgrade mockito to 4.11
[ https://issues.apache.org/jira/browse/HIVE-27702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HIVE-27702: Parent: HIVE-27735 Issue Type: Sub-task (was: Task) > Remove PowerMock from beeline and upgrade mockito to 4.11 > - > > Key: HIVE-27702 > URL: https://issues.apache.org/jira/browse/HIVE-27702 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Zsolt Miskolczi >Assignee: Mayank Kunwar >Priority: Major > Labels: newbie, pull-request-available, starter > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26455) Remove PowerMockito from hive-exec
[ https://issues.apache.org/jira/browse/HIVE-26455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HIVE-26455: Parent: HIVE-27735 Issue Type: Sub-task (was: Improvement) > Remove PowerMockito from hive-exec > -- > > Key: HIVE-26455 > URL: https://issues.apache.org/jira/browse/HIVE-26455 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Zsolt Miskolczi >Assignee: Zsolt Miskolczi >Priority: Minor > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > PowerMockito is a mockito extension that introduces some painful points. > The main intention behind that is to be able to do static mocking. Since its > release, mockito-inline has been released, as a part of the mockito-core. > It doesn't require vintage test runner to be able to run and it can mock > objects with their own thread. > The goal is to stop using PowerMockito and use mockito-inline instead. > > The affected packages are: > * org.apache.hadoop.hive.ql.exec.repl > * org.apache.hadoop.hive.ql.exec.repl.bootstrap.load > * org.apache.hadoop.hive.ql.exec.repl.ranger; > * org.apache.hadoop.hive.ql.exec.util > * org.apache.hadoop.hive.ql.parse.repl > * org.apache.hadoop.hive.ql.parse.repl.load.message > * org.apache.hadoop.hive.ql.parse.repl.metric > * org.apache.hadoop.hive.ql.txn.compactor > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27735) Remove Powermock
Ayush Saxena created HIVE-27735: --- Summary: Remove Powermock Key: HIVE-27735 URL: https://issues.apache.org/jira/browse/HIVE-27735 Project: Hive Issue Type: Task Reporter: Ayush Saxena Powermock has compatibility issues with higher java versions & can be replaced with mockito -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27734) Add Icenerg's storage-partitioned join capabilities to Hive's [sorted-]bucket-map-join
Janos Kovacs created HIVE-27734: --- Summary: Add Icenerg's storage-partitioned join capabilities to Hive's [sorted-]bucket-map-join Key: HIVE-27734 URL: https://issues.apache.org/jira/browse/HIVE-27734 Project: Hive Issue Type: Improvement Components: Iceberg integration Affects Versions: 4.0.0-alpha-2 Reporter: Janos Kovacs Iceberg's 'data bucketing' is implemented through its rich (function based) partitioning feature which helps to optimize join operations - called storage partitioned joins. doc: [https://docs.google.com/document/d/1foTkDSM91VxKgkEcBMsuAvEjNybjja-uHk-r3vtXWFE/edit#heading=h.82w8qxfl2uwl] spark impl.: https://issues.apache.org/jira/browse/SPARK-37375 This feature is not yet leveraged in Hive into its bucket-map-join optimization, neither alone nor with Iceberg's SortOrder to sorted-bucket-map-join. Customers migrating from Hive table format to Iceberg format with storage optimized schema will experience performance degradation on large tables where Iceberg's gain on no-listing performance improvement is significantly smaller than the actual join performance over bucket-join or even sorted-bucket-join. {noformat} SET hive.query.results.cache.enabled=false; SET hive.fetch.task.conversion = none; SET hive.optimize.bucketmapjoin=true; SET hive.convert.join.bucket.mapjoin.tez=true; SET hive.auto.convert.join.noconditionaltask.size=1000; --if you are working with external table, you need this for bmj: SET hive.disable.unsafe.external.table.operations=false; -- HIVE BUCKET-MAP-JOIN DROP TABLE IF EXISTS default.hivebmjt1 PURGE; DROP TABLE IF EXISTS default.hivebmjt2 PURGE; CREATE TABLE default.hivebmjt1 (id int, txt string) CLUSTERED BY (id) INTO 8 BUCKETS; CREATE TABLE default.hivebmjt2 (id int, txt string); INSERT INTO default.hivebmjt1 VALUES (1,'1'),(2,'2'),(3,'3'),(4,'4'),(5,'5'),(6,'6'),(7,'7'),(8,'8'); INSERT INTO default.hivebmjt2 VALUES (1,'1'),(2,'2'),(3,'3'),(4,'4'); EXPLAIN SELECT * FROM default.hivebmjt1 f INNER JOIN default.hivebmjt2 d ON f.id = d.id; EXPLAIN SELECT * FROM default.hivebmjt1 f LEFT OUTER JOIN default.hivebmjt2 d ON f.id = d.id; -- Both are optimized into BMJ -- ICEBERG BUCKET-MAP-JOIN via Iceberg's storage-partitioned join DROP TABLE IF EXISTS default.icespbmjt1 PURGE; DROP TABLE IF EXISTS default.icespbmjt2 PURGE; CREATE TABLE default.icespbmjt1 (txt string) PARTITIONED BY (id int) STORED BY ICEBERG ; CREATE TABLE default.icespbmjt2 (txt string) PARTITIONED BY (id int) STORED BY ICEBERG ; INSERT INTO default.icespbmjt1 VALUES ('1',1),('2',2),('3',3),('4',4); INSERT INTO default.icespbmjt2 VALUES ('1',1),('2',2),('3',3),('4',4); EXPLAIN SELECT * FROM default.icespbmjt1 f INNER JOIN default.icespbmjt2 d ON f.id = d.id; EXPLAIN SELECT * FROM default.icespbmjt1 f LEFT OUTER JOIN default.icespbmjt2 d ON f.id = d.id; -- Only Map-Join optimised {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27733) Intermittent ConcurrentModificationException in HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-27733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769064#comment-17769064 ] Stamatis Zampetakis commented on HIVE-27733: This seems to be a duplicate of HIVE-18928, HIVE-19133. > Intermittent ConcurrentModificationException in HiveServer2 > --- > > Key: HIVE-27733 > URL: https://issues.apache.org/jira/browse/HIVE-27733 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0-beta-1 >Reporter: Henri Biestro >Assignee: Henri Biestro >Priority: Major > > Some tests sporadically fail with a cause that looks like: > {code} > Caused by: java.util.ConcurrentModificationException > at java.util.HashMap$HashIterator.nextNode(HashMap.java:1493) ~[?:?] > at java.util.HashMap$EntryIterator.next(HashMap.java:1526) ~[?:?] > at java.util.HashMap$EntryIterator.next(HashMap.java:1524) ~[?:?] > at java.util.AbstractCollection.toArray(AbstractCollection.java:200) > ~[?:?] > at com.google.common.collect.Iterables.toArray(Iterables.java:285) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:451) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:436) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at > org.apache.hadoop.hive.ql.log.PerfLogger.getEndTimes(PerfLogger.java:227) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:629) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:560) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:554) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:200) > ~[hive-service-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > ... 51 more > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27733) Intermittent ConcurrentModificationException in HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-27733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henri Biestro updated HIVE-27733: - Description: Some tests sporadically fail with a cause that looks like: {code} Caused by: java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1493) ~[?:?] at java.util.HashMap$EntryIterator.next(HashMap.java:1526) ~[?:?] at java.util.HashMap$EntryIterator.next(HashMap.java:1524) ~[?:?] at java.util.AbstractCollection.toArray(AbstractCollection.java:200) ~[?:?] at com.google.common.collect.Iterables.toArray(Iterables.java:285) ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:451) ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:436) ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] at org.apache.hadoop.hive.ql.log.PerfLogger.getEndTimes(PerfLogger.java:227) ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:629) ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:560) ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:554) ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127) ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:200) ~[hive-service-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] ... 51 more {code} was: Some tests sporadically fail with a cause that looks like: Caused by: java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1493) ~[?:?] at java.util.HashMap$EntryIterator.next(HashMap.java:1526) ~[?:?] at java.util.HashMap$EntryIterator.next(HashMap.java:1524) ~[?:?] at java.util.AbstractCollection.toArray(AbstractCollection.java:200) ~[?:?] at com.google.common.collect.Iterables.toArray(Iterables.java:285) ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:451) ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:436) ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] at org.apache.hadoop.hive.ql.log.PerfLogger.getEndTimes(PerfLogger.java:227) ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:629) ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:560) ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:554) ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127) ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:200) ~[hive-service-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] ... 51 more > Intermittent ConcurrentModificationException in HiveServer2 > --- > > Key: HIVE-27733 > URL: https://issues.apache.org/jira/browse/HIVE-27733 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0-beta-1 >Reporter: Henri Biestro >Assignee: Henri Biestro >Priority: Major > > Some tests sporadically fail with a cause that looks like: > {code} > Caused by: java.util.ConcurrentModificationException > at java.util.HashMap$HashIterator.nextNode(HashMap.java:1493) ~[?:?] > at java.util.HashMap$EntryIterator.next(HashMap.java:1526) ~[?:?] > at java.util.HashMap$EntryIterator.next(HashMap.java:1524) ~[?:?] > at java.util.AbstractCollection.toArray(AbstractCollection.java:200) > ~[?:?] > at com.google.common.collect.Iterables.toArray(Iterables.java:285) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:451) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:436) > ~[hive-exec-3.1.3000.7.1.8.0-
[jira] [Commented] (HIVE-27733) Intermittent ConcurrentModificationException in HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-27733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769055#comment-17769055 ] Henri Biestro commented on HIVE-27733: -- It seems to indicate the PerfLogger endTimes map gets modified concurrently with its copy; note that that class is not thread-safe since it is expected (originally) to be used through a ThreadLocal and static methods. However, the method PerfLogger.setPerfLogger sets that thread-local member, is called by SQLOperation.BackgroundWork and shares the (parent) instance of the PerfLogger, essentially creating a potential race condition between parent and background workers. I'd suggest using a ConcurrentHashMap instead of an HashMap as a blind fix for the internal endTimes/startTimes. ... > Intermittent ConcurrentModificationException in HiveServer2 > --- > > Key: HIVE-27733 > URL: https://issues.apache.org/jira/browse/HIVE-27733 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0-beta-1 >Reporter: Henri Biestro >Assignee: Henri Biestro >Priority: Major > > Some tests sporadically fail with a cause that looks like: > Caused by: java.util.ConcurrentModificationException > at java.util.HashMap$HashIterator.nextNode(HashMap.java:1493) ~[?:?] > at java.util.HashMap$EntryIterator.next(HashMap.java:1526) ~[?:?] > at java.util.HashMap$EntryIterator.next(HashMap.java:1524) ~[?:?] > at java.util.AbstractCollection.toArray(AbstractCollection.java:200) > ~[?:?] > at com.google.common.collect.Iterables.toArray(Iterables.java:285) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:451) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:436) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at > org.apache.hadoop.hive.ql.log.PerfLogger.getEndTimes(PerfLogger.java:227) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:629) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:560) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:554) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127) > ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:200) > ~[hive-service-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] > ... 51 more -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27733) Intermittent ConcurrentModificationException in HiveServer2
Henri Biestro created HIVE-27733: Summary: Intermittent ConcurrentModificationException in HiveServer2 Key: HIVE-27733 URL: https://issues.apache.org/jira/browse/HIVE-27733 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 4.0.0-beta-1 Reporter: Henri Biestro Assignee: Henri Biestro Some tests sporadically fail with a cause that looks like: Caused by: java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1493) ~[?:?] at java.util.HashMap$EntryIterator.next(HashMap.java:1526) ~[?:?] at java.util.HashMap$EntryIterator.next(HashMap.java:1524) ~[?:?] at java.util.AbstractCollection.toArray(AbstractCollection.java:200) ~[?:?] at com.google.common.collect.Iterables.toArray(Iterables.java:285) ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:451) ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:436) ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] at org.apache.hadoop.hive.ql.log.PerfLogger.getEndTimes(PerfLogger.java:227) ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:629) ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:560) ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:554) ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127) ~[hive-exec-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:200) ~[hive-service-3.1.3000.7.1.8.0-774.jar:3.1.3000.7.1.8.0-774] ... 51 more -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27731) Perform metadata delete when only static filters are present
[ https://issues.apache.org/jira/browse/HIVE-27731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27731: -- Labels: pull-request-available (was: ) > Perform metadata delete when only static filters are present > > > Key: HIVE-27731 > URL: https://issues.apache.org/jira/browse/HIVE-27731 > Project: Hive > Issue Type: Improvement >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > > When the query has static filters only, try to perform a metadata delete > directly rather than moving forward with positional delete. > Some relevant use cases where metadata deletes can be used - > {code:java} > DELETE FROM ice_table where id = 1;{code} > As seen above only filter is (id = 1). If in scenarios wherein the filter > corresponds to a partition column then metadata delete is more efficient and > does not generate additional files. > For partition evolution cases, if it is not possible to perform metadata > delete then positional delete is done. > Some other optimisations that can be seen here is utilizing vectorized > expressions for UDFs which provide vectorized expressions such as year - > {code:java} > DELETE FROM ice_table where id = 1 AND year(datecol) = 2015;{code} > Delete query with Multi-table scans will not optimized using this method > since determination of where clauses happens at runtime. > A similar optimisation is seen in Spark where metadata delete is done > whenever possible- > [https://github.com/apache/iceberg/blob/master/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java#L297-L389] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27732) Backward compatibility for Hive with Components like Spark
[ https://issues.apache.org/jira/browse/HIVE-27732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Raj updated HIVE-27732: Description: Added additional functions for OSS Spark 3.3 and HMS 3.1.2 Compatibility. These are the functions used by Spark when it integrates with Hive : List of all functions called by HiveShim.scala. This can be found in Spark 3.3 codebase. 1. hive.dropDatabase(dbName, deleteData, ignoreUnknownDb, cascade) 2. hive.alterDatabase(dbName, d) 3. hive.getDatabase(dbName) 4. hive.getAllDatabases.asScala.toSeq 5. hive.getDatabasesByPattern(pattern).asScala.toSeq 6. hive.databaseExists(dbName) 7. getAllPartitionsMethod.invoke(hive, table) 8. getPartitionsByFilterMethod.invoke(hive, table, filter) 9. alterTableMethod.invoke(hive, tableName, table, environmentContextInAlterTable) 10. alterPartitionsMethod.invoke(hive, tableName, newParts, environmentContextInAlterTable) 11. hive.createTable(table, ifNotExists) 12. hive.getTable(database, tableName) 13. hive.getTable(dbName, tableName, throwException) 14. hive.getTable(tableName) 15. getTablesByTypeMethod.invoke(hive, dbName, pattern, tableType) 16. hive.getTablesByPattern(dbName, pattern).asScala.toSeq 17. hive.getAllTables(dbName).asScala.toSeq 18. hive.dropTable(dbName, tableName, deleteData, ignoreIfNotExists) 19. hive.dropTable(dbName, tableName) 20. dropTableMethod.invoke(hive, dbName, tableName, deleteData: JBoolean,ignoreIfNotExists: JBoolean, purge: JBoolean) 21. hive.getPartition(table, partSpec, forceCreate) 22. hive.getPartitions(table, partSpec).asScala.toSeq 23. hive.getPartitionNames(dbName, tableName, max).asScala.toSeq 24. hive.getPartitionNames(dbName, tableName, partSpec, max).asScala.toSeq 25. createPartitionMethod.invoke( hive, table, spec, location, params, // partParams null, // inputFormat null, // outputFormat -1: JInteger, // numBuckets null, // cols null, // serializationLib null, // serdeParams null, // bucketCols null) // sortCols } 26. hive.createPartitions(addPartitionDesc) 27. loadPartitionMethod.invoke(hive, loadPath, tableName, partSpec, replace: JBoolean, inheritTableSpecs: JBoolean, isSkewedStoreAsSubdir: JBoolean, isSrcLocal: JBoolean, isAcid, hasFollowingStatsTask) 28. hive.renamePartition(table, oldPartSpec, newPart) 29. loadTableMethod.invoke(hive, loadPath, tableName, loadFileType.get, isSrcLocal: JBoolean, isSkewedStoreAsSubdir, isAcidIUDoperation, hasFollowingStatsTask, writeIdInLoadTableOrPartition, stmtIdInLoadTableOrPartition: JInteger, replace: JBoolean) 30. loadDynamicPartitionsMethod.invoke(hive, loadPath, tableName, partSpec, loadFileType.get, numDP: JInteger, listBucketingLevel, isAcid, writeIdInLoadTableOrPartition, stmtIdInLoadTableOrPartition, hasFollowingStatsTask, AcidUtils.Operation.NOT_ACID, replace: JBoolean) 31. hive.createFunction(toHiveFunction(func, db)) 32. hive.dropFunction(db, name) 33. hive.alterFunction(db, oldName, hiveFunc) 34. hive.getFunctions(db, pattern).asScala.toSeq 35. dropIndexMethod.invoke(hive, dbName, tableName, indexName, throwExceptionInDropIndex, deleteDataInDropIndex) > Backward compatibility for Hive with Components like Spark > -- > > Key: HIVE-27732 > URL: https://issues.apache.org/jira/browse/HIVE-27732 > Project: Hive > Issue Type: Sub-task >Affects Versions: 3.2.0 >Reporter: Aman Raj >Assignee: Aman Raj >Priority: Major > > Added additional functions for OSS Spark 3.3 and HMS 3.1.2 Compatibility. > These are the functions used by Spark when it integrates with Hive : > > List of all functions called by HiveShim.scala. This can be found in Spark > 3.3 codebase. > 1. hive.dropDatabase(dbName, deleteData, ignoreUnknownDb, cascade) > 2. hive.alterDatabase(dbName, d) > 3. hive.getDatabase(dbName) > 4. hive.getAllDatabases.asScala.toSeq > 5. hive.getDatabasesByPattern(pattern).asScala.toSeq > 6. hive.databaseExists(dbName) > 7. getAllPartitionsMethod.invoke(hive, table) > 8. getPartitionsByFilterMethod.invoke(hive, table, filter) > 9. alterTableMethod.invoke(hive, tableName, table, > environmentContextInAlterTable) > 10. alterPartitionsMethod.invoke(hive, tableName, newParts, > environmentContextInAlterTable) > 11. hive.createTable(table, ifNotExists) > 12. hive.getTable(database, tableName) > 13. hive.getTable(dbName, tableName, throwException) > 14. hive.getTable(tableName) > 15. getTablesByTypeMethod.invoke(hive, dbName, pattern, tableType) > 16. hive.getTablesByPattern(dbName, pattern).asScala.toSeq > 17. hive.getAllTables(dbName).asScala.toSeq > 18. hive.dropTable(dbName, tableName, deleteData, ignoreIfNotExists) > 19. hive.dropTable(dbNa
[jira] [Created] (HIVE-27732) Backward compatibility for Hive with Components like Spark
Aman Raj created HIVE-27732: --- Summary: Backward compatibility for Hive with Components like Spark Key: HIVE-27732 URL: https://issues.apache.org/jira/browse/HIVE-27732 Project: Hive Issue Type: Sub-task Affects Versions: 3.2.0 Reporter: Aman Raj Assignee: Aman Raj -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27731) Perform metadata delete when only static filters are present
Sourabh Badhya created HIVE-27731: - Summary: Perform metadata delete when only static filters are present Key: HIVE-27731 URL: https://issues.apache.org/jira/browse/HIVE-27731 Project: Hive Issue Type: Improvement Reporter: Sourabh Badhya Assignee: Sourabh Badhya When the query has static filters only, try to perform a metadata delete directly rather than moving forward with positional delete. Some relevant use cases where metadata deletes can be used - {code:java} DELETE FROM ice_table where id = 1;{code} As seen above only filter is (id = 1). If in scenarios wherein the filter corresponds to a partition column then metadata delete is more efficient and does not generate additional files. For partition evolution cases, if it is not possible to perform metadata delete then positional delete is done. Some other optimisations that can be seen here is utilizing vectorized expressions for UDFs which provide vectorized expressions such as year - {code:java} DELETE FROM ice_table where id = 1 AND year(datecol) = 2015;{code} Delete query with Multi-table scans will not optimized using this method since determination of where clauses happens at runtime. A similar optimisation is seen in Spark where metadata delete is done whenever possible- [https://github.com/apache/iceberg/blob/master/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java#L297-L389] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-21100) Allow flattening of table subdirectories resulted when using TEZ engine and UNION clause
[ https://issues.apache.org/jira/browse/HIVE-21100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HIVE-21100: Priority: Major (was: Minor) > Allow flattening of table subdirectories resulted when using TEZ engine and > UNION clause > > > Key: HIVE-21100 > URL: https://issues.apache.org/jira/browse/HIVE-21100 > Project: Hive > Issue Type: Improvement >Reporter: George Pachitariu >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-21100.1.patch, HIVE-21100.2.patch, > HIVE-21100.3.patch, HIVE-21100.patch > > Time Spent: 2.5h > Remaining Estimate: 0h > > Right now, when writing data into a table with Tez engine and the clause > UNION ALL is the last step of the query, Hive on Tez will create a > subdirectory for each branch of the UNION ALL. > With this patch the subdirectories are removed, and the files are renamed and > moved to the parent directory. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27730) Bump org.xerial.snappy:snappy-java from 1.1.10.1 to 1.1.10.4
Ayush Saxena created HIVE-27730: --- Summary: Bump org.xerial.snappy:snappy-java from 1.1.10.1 to 1.1.10.4 Key: HIVE-27730 URL: https://issues.apache.org/jira/browse/HIVE-27730 Project: Hive Issue Type: Bug Reporter: Ayush Saxena PR from [dependabot|https://github.com/apps/dependabot]: [https://github.com/apache/hive/pull/4746] -- This message was sent by Atlassian Jira (v8.20.10#820010)