[jira] [Resolved] (HIVE-28188) Upgrade PostGres to 42.7.3
[ https://issues.apache.org/jira/browse/HIVE-28188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Butao Zhang resolved HIVE-28188. Fix Version/s: 4.1.0 Resolution: Fixed Fix has been merged into master branch. Thanks [~devaspatikrishnatri] for the patch!!! > Upgrade PostGres to 42.7.3 > -- > > Key: HIVE-28188 > URL: https://issues.apache.org/jira/browse/HIVE-28188 > Project: Hive > Issue Type: Task > Components: Hive >Reporter: Devaspati Krishnatri >Assignee: Devaspati Krishnatri >Priority: Major > Labels: Security, pull-request-available > Fix For: 4.1.0 > > Attachments: mvn_dependency_tree.txt > > > Upgrade Postgres to 42.7.3 to target critical CVEs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28213) Incorrect results after insert-select from similar bucketed source & target table
[ https://issues.apache.org/jira/browse/HIVE-28213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-28213: -- Description: Insert-select is not honoring bucketing if both source & target are bucketed on same column. eg., {code:java} CREATE EXTERNAL TABLE bucketing_table1 (id INT) CLUSTERED BY (id) SORTED BY (id ASC) INTO 32 BUCKETS stored as textfile; INSERT INTO TABLE bucketing_table1 VALUES (1), (2), (3), (4), (5); CREATE EXTERNAL TABLE bucketing_table2 like bucketing_table1; INSERT INTO TABLE bucketing_table2 select * from bucketing_table1;{code} id=1 => murmur_hash(1) %32 should go to 29th bucket file. bucketing_table1 has id=1 at 29th file, but bucketing_table2 doesn't have 29th file because Insert-select dint honor the bucketing. {code:java} SELECT count(*) FROM bucketing_table1 WHERE id = 1; === 1 //correct result SELECT count(*) FROM bucketing_table2 WHERE id = 1; === 0 // incorrect result select *, INPUT__FILE__NAME from bucketing_table1; +--++ | bucketing_table1.id | input__file__name | +--++ | 2 | /bucketing_table1/04_0 | | 3 | /bucketing_table1/06_0 | | 5 | /bucketing_table1/15_0 | | 4 | /bucketing_table1/21_0 | | 1 | /bucketing_table1/29_0 | +--++ select *, INPUT__FILE__NAME from bucketing_table2; +-++ | bucketing_table2.id | input__file__name | +-++ | 2 | /bucketing_table2/00_0 | | 3 | /bucketing_table2/01_0 | | 5 | /bucketing_table2/02_0 | | 4 | /bucketing_table2/03_0 | | 1 | /bucketing_table2/04_0 | +--++{code} Workaround for read: hive.tez.bucket.pruning=false; PS: Attaching repro file [^test.q] was: Insert-select is not honoring bucketing if both source & target are bucketed on same column. eg., {code:java} CREATE EXTERNAL TABLE bucketing_table1 (id INT) CLUSTERED BY (id) SORTED BY (id ASC) INTO 32 BUCKETS stored as textfile; INSERT INTO TABLE bucketing_table1 VALUES (1), (2), (3), (4), (5); CREATE EXTERNAL TABLE bucketing_table2 like bucketing_table1; INSERT INTO TABLE bucketing_table2 select * from bucketing_table1;{code} id=1 => murmur_hash(1) %32 should go to 29th bucket file. bucketing_table1 has id=1 at 29th file, but bucketing_table2 doesn't have 29th file because Insert-select dint honor the bucketing. {code:java} SELECT count(*) FROM bucketing_table1 WHERE id = 1; === 1 //correct result SELECT count(*) FROM bucketing_table2 WHERE id = 1; === 0 // incorrect result select *, INPUT__FILE__NAME from bucketing_table1; +--++ | bucketing_table1.id | input__file__name | +--++ | 2 | /bucketing_table1/04_0 | | 3 | /bucketing_table1/06_0 | | 5 | /bucketing_table1/15_0 | | 4 | /bucketing_table1/21_0 | | 1 | /bucketing_table1/29_0 | +--++ select *, INPUT__FILE__NAME from bucketing_table2; +-++ | bucketing_table2.id | input__file__name | +-++ | 2 | /bucketing_table2/00_0 | | 3 | /bucketing_table2/01_0 | | 5 | /bucketing_table2/02_0 | | 4 | /bucketing_table2/03_0 | | 1 | /bucketing_table2/04_0 | +--++{code} Query to identify in which bucketFile a particular row should be {code:java} with t as (select *, murmur_hash(id)%32 as bucket, INPUT__FILE__NAME from bucketing_table1) select id, (case when bucket > 0 then bucket else 32 + bucket end) as bucket_number, INPUT__FILE__NAME from t; +-+++ | id | bucket_number | input__file__name | +-+++ | 2 | 4 | /bucketing_table1/04_0 | | 3 | 6 | /bucketing_table1/06_0 | | 5 | 15 | /bucketing_ta
[jira] [Updated] (HIVE-28213) Incorrect results after insert-select from similar bucketed source & target table
[ https://issues.apache.org/jira/browse/HIVE-28213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-28213: -- Description: Insert-select is not honoring bucketing if both source & target are bucketed on same column. eg., {code:java} CREATE EXTERNAL TABLE bucketing_table1 (id INT) CLUSTERED BY (id) SORTED BY (id ASC) INTO 32 BUCKETS stored as textfile; INSERT INTO TABLE bucketing_table1 VALUES (1), (2), (3), (4), (5); CREATE EXTERNAL TABLE bucketing_table2 like bucketing_table1; INSERT INTO TABLE bucketing_table2 select * from bucketing_table1;{code} id=1 => murmur_hash(1) %32 should go to 29th bucket file. bucketing_table1 has id=1 at 29th file, but bucketing_table2 doesn't have 29th file because Insert-select dint honor the bucketing. {code:java} SELECT count(*) FROM bucketing_table1 WHERE id = 1; === 1 //correct result SELECT count(*) FROM bucketing_table2 WHERE id = 1; === 0 // incorrect result select *, INPUT__FILE__NAME from bucketing_table1; +--++ | bucketing_table1.id | input__file__name | +--++ | 2 | /bucketing_table1/04_0 | | 3 | /bucketing_table1/06_0 | | 5 | /bucketing_table1/15_0 | | 4 | /bucketing_table1/21_0 | | 1 | /bucketing_table1/29_0 | +--++ select *, INPUT__FILE__NAME from bucketing_table2; +-++ | bucketing_table2.id | input__file__name | +-++ | 2 | /bucketing_table2/00_0 | | 3 | /bucketing_table2/01_0 | | 5 | /bucketing_table2/02_0 | | 4 | /bucketing_table2/03_0 | | 1 | /bucketing_table2/04_0 | +--++{code} Query to identify in which bucketFile a particular row should be {code:java} with t as (select *, murmur_hash(id)%32 as bucket, INPUT__FILE__NAME from bucketing_table1) select id, (case when bucket > 0 then bucket else 32 + bucket end) as bucket_number, INPUT__FILE__NAME from t; +-+++ | id | bucket_number | input__file__name | +-+++ | 2 | 4 | /bucketing_table1/04_0 | | 3 | 6 | /bucketing_table1/06_0 | | 5 | 15 | /bucketing_table1/15_0 | | 4 | 21 | /bucketing_table1/21_0 | | 1 | 29 | /bucketing_table1/29_0 | +-+++{code} Workaround for read: hive.tez.bucket.pruning=false; PS: Attaching repro file [^test.q] was: Insert-select is not honoring bucketing if both source & target are bucketed on same column. eg., {code:java} CREATE EXTERNAL TABLE bucketing_table1 (id INT) CLUSTERED BY (id) SORTED BY (id ASC) INTO 32 BUCKETS stored as textfile; INSERT INTO TABLE bucketing_table1 VALUES (1), (2), (3), (4), (5); CREATE EXTERNAL TABLE bucketing_table2 like bucketing_table1; INSERT INTO TABLE bucketing_table2 select * from bucketing_table1;{code} id=1 => murmur_hash(1) %32 should go to 29th bucket file. bucketing_table1 has id=1 at 29th file, but bucketing_table2 doesn't have 29th file because Insert-select dint honor the bucketing. {code:java} SELECT count(*) FROM bucketing_table1 WHERE id = 1; === 1 //correct result SELECT count(*) FROM bucketing_table2 WHERE id = 1; === 0 // incorrect result select *, INPUT__FILE__NAME from bucketing_table1; +--++ | bucketing_table1.id | input__file__name | +--++ | 2 | /bucketing_table1/04_0 | | 3 | /bucketing_table1/06_0 | | 5 | /bucketing_table1/15_0 | | 4 | /bucketing_table1/21_0 | | 1 | /bucketing_table1/29_0 | +--++ select *, INPUT__FILE__NAME from bucketing_table2; +-++ | bucketing_table2.id | input__file__name | +-++ | 2 | /bucketing_table2/00_0 | | 3 | /bucketing_table2/01_0 | | 5
[jira] [Updated] (HIVE-28216) Upgrade Commons-Configuration to 2.10.1
[ https://issues.apache.org/jira/browse/HIVE-28216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28216: -- Labels: pull-request-available (was: ) > Upgrade Commons-Configuration to 2.10.1 > --- > > Key: HIVE-28216 > URL: https://issues.apache.org/jira/browse/HIVE-28216 > Project: Hive > Issue Type: Task >Reporter: Devaspati Krishnatri >Assignee: Devaspati Krishnatri >Priority: Major > Labels: pull-request-available > > Upgrade Commons-Configuration to 2.10.1 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28216) Upgrade Commons-Configuration to 2.10.1
Devaspati Krishnatri created HIVE-28216: --- Summary: Upgrade Commons-Configuration to 2.10.1 Key: HIVE-28216 URL: https://issues.apache.org/jira/browse/HIVE-28216 Project: Hive Issue Type: Task Reporter: Devaspati Krishnatri Assignee: Devaspati Krishnatri Upgrade Commons-Configuration to 2.10.1 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28215) Signalling CONDITION HANDLER is not working in HPLSQL.
Dayakar M created HIVE-28215: Summary: Signalling CONDITION HANDLER is not working in HPLSQL. Key: HIVE-28215 URL: https://issues.apache.org/jira/browse/HIVE-28215 Project: Hive Issue Type: Bug Components: hpl/sql Reporter: Dayakar M Assignee: Dayakar M Signalling CONDITION HANDLER is not working in HPLSQL. Steps to Reproduce: {noformat} jdbc:hive2://ccycloud-1.nightly-71x-oq.roo> DECLARE cnt INT DEFAULT 0; . . . . . . . . . . . . . . . . . . . . . . .> DECLARE wrong_cnt_condition CONDITION; . . . . . . . . . . . . . . . . . . . . . . .> . . . . . . . . . . . . . . . . . . . . . . .> DECLARE EXIT HANDLER FOR wrong_cnt_condition . . . . . . . . . . . . . . . . . . . . . . .> PRINT 'Wrong number of rows'; . . . . . . . . . . . . . . . . . . . . . . .> . . . . . . . . . . . . . . . . . . . . . . .> EXECUTE IMMEDIATE 'SELECT COUNT(*) FROM sys.tbls' INTO cnt; . . . . . . . . . . . . . . . . . . . . . . .> . . . . . . . . . . . . . . . . . . . . . . .> IF cnt <> 0 THEN . . . . . . . . . . . . . . . . . . . . . . .> SIGNAL wrong_cnt_condition; . . . . . . . . . . . . . . . . . . . . . . .> END IF; . . . . . . . . . . . . . . . . . . . . . . .> / INFO : Compiling command(queryId=hive_20240424171747_7f22fef6-70d5-483a-af67-7a6b9f17ac8b): SELECT COUNT(*) FROM sys.tbls INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20240424171747_7f22fef6-70d5-483a-af67-7a6b9f17ac8b); Time taken: 0.995 seconds INFO : Completed executing command(queryId=hive_20240424171747_7f22fef6-70d5-483a-af67-7a6b9f17ac8b); Time taken: 8.479 seconds INFO : OK ERROR : wrong_cnt_condition No rows affected (9.559 seconds) 0: jdbc:hive2://localhost>{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28214) HPLSQL not using the hive variables passed through beeline using --hivevar option
[ https://issues.apache.org/jira/browse/HIVE-28214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dayakar M updated HIVE-28214: - Description: HPLSQL not using the hive variables passed through beeline using --hivevar option. Steps to reproduce: {noformat} beeline -u 'jdbc:hive2://localhost:1/default;user=hive;password=hive;mode=hplsql' --hivevar hivedb=sys --hivevar hivetbl=tbls{noformat} {noformat} 0: jdbc:hive2://localhost> DECLARE hivedb_tbl string; . . . . . . . . . . . . . . . . . . . . . . .> SELECT hivedb || '.' || hivetbl into hivedb_tbl; . . . . . . . . . . . . . . . . . . . . . . .> PRINT hivedb_tbl; . . . . . . . . . . . . . . . . . . . . . . .> / INFO : Compiling command(queryId=hive_20240424145826_617acb79-0b27-46eb-aa05-1332703c94fb): SELECT CONCAT(hivedb, '.', hivetbl) ERROR : FAILED: SemanticException [Error 10004]: Line 1:14 Invalid table alias or column reference 'hivedb': (possible column names are: ) org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:14 Invalid table alias or column reference 'hivedb': (possible column names are: ) INFO : Completed compiling command(queryId=hive_20240424145826_617acb79-0b27-46eb-aa05-1332703c94fb); Time taken: 3.976 seconds ERROR : Unhandled exception in HPL/SQL No rows affected (4.901 seconds) 0: jdbc:hive2://localhost> {noformat} was: HPLSQL not using the hive variables passed through beeline using --hivevar option. Steps to reproduce: {noformat} beeline -u 'jdbc:hive2://localhost:1/default;user=hive;password=hive;mode=hplsql' --hivevar hivedb=sys --hivevar hivetbl=tbls{noformat} {noformat} 0: jdbc:hive2://localhost> DECLARE hivedb_tbl string; . . . . . . . . . . . . . . . . . . . . . . .> SELECT hivedb || '.' || hivetbl into hivedb_tbl; . . . . . . . . . . . . . . . . . . . . . . .> PRINT hivedb_tbl; . . . . . . . . . . . . . . . . . . . . . . .> / INFO : Compiling command(queryId=hive_20240424145826_617acb79-0b27-46eb-aa05-1332703c94fb): SELECT CONCAT(hivedb, '.', hivetbl) ERROR : FAILED: SemanticException [Error 10004]: Line 1:14 Invalid table alias or column reference 'hivedb': (possible column names are: ) org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:14 Invalid table alias or column reference 'hivedb': (possible column names are: ) INFO : Completed compiling command(queryId=hive_20240424145826_617acb79-0b27-46eb-aa05-1332703c94fb); Time taken: 3.976 seconds ERROR : Unhandled exception in HPL/SQL No rows affected (4.901 seconds) 0: jdbc:hive2://localhost> {noformat} > HPLSQL not using the hive variables passed through beeline using --hivevar > option > - > > Key: HIVE-28214 > URL: https://issues.apache.org/jira/browse/HIVE-28214 > Project: Hive > Issue Type: Bug > Components: hpl/sql >Reporter: Dayakar M >Assignee: Dayakar M >Priority: Major > > HPLSQL not using the hive variables passed through beeline using --hivevar > option. > Steps to reproduce: > {noformat} > beeline -u > 'jdbc:hive2://localhost:1/default;user=hive;password=hive;mode=hplsql' > --hivevar hivedb=sys --hivevar hivetbl=tbls{noformat} > {noformat} > 0: jdbc:hive2://localhost> DECLARE hivedb_tbl string; > . . . . . . . . . . . . . . . . . . . . . . .> SELECT hivedb || '.' || > hivetbl into hivedb_tbl; > . . . . . . . . . . . . . . . . . . . . . . .> PRINT hivedb_tbl; > . . . . . . . . . . . . . . . . . . . . . . .> / > INFO : Compiling > command(queryId=hive_20240424145826_617acb79-0b27-46eb-aa05-1332703c94fb): > SELECT CONCAT(hivedb, '.', hivetbl) > ERROR : FAILED: SemanticException [Error 10004]: Line 1:14 Invalid table > alias or column reference 'hivedb': (possible column names are: ) > org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:14 Invalid table > alias or column reference 'hivedb': (possible column names are: ) > > INFO : Completed compiling > command(queryId=hive_20240424145826_617acb79-0b27-46eb-aa05-1332703c94fb); > Time taken: 3.976 seconds > ERROR : Unhandled exception in HPL/SQL > No rows affected (4.901 seconds) > 0: jdbc:hive2://localhost> > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28214) HPLSQL not using the hive variables passed through beeline using --hivevar option
[ https://issues.apache.org/jira/browse/HIVE-28214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dayakar M updated HIVE-28214: - Description: HPLSQL not using the hive variables passed through beeline using --hivevar option. Steps to reproduce: {noformat} beeline -u 'jdbc:hive2://localhost:1/default;user=hive;password=hive;mode=hplsql' --hivevar hivedb=sys --hivevar hivetbl=tbls{noformat} {noformat} 0: jdbc:hive2://localhost> DECLARE hivedb_tbl string; . . . . . . . . . . . . . . . . . . . . . . .> SELECT hivedb || '.' || hivetbl into hivedb_tbl; . . . . . . . . . . . . . . . . . . . . . . .> PRINT hivedb_tbl; . . . . . . . . . . . . . . . . . . . . . . .> / INFO : Compiling command(queryId=hive_20240424145826_617acb79-0b27-46eb-aa05-1332703c94fb): SELECT CONCAT(hivedb, '.', hivetbl) ERROR : FAILED: SemanticException [Error 10004]: Line 1:14 Invalid table alias or column reference 'hivedb': (possible column names are: ) org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:14 Invalid table alias or column reference 'hivedb': (possible column names are: ) INFO : Completed compiling command(queryId=hive_20240424145826_617acb79-0b27-46eb-aa05-1332703c94fb); Time taken: 3.976 seconds ERROR : Unhandled exception in HPL/SQL No rows affected (4.901 seconds) 0: jdbc:hive2://localhost> {noformat} was: HPLSQL not using the hive variables passed through beeline using --hivevar option. Steps to reproduce: {noformat} beeline -u 'jdbc:hive2://localhost:1/default;user=hive;password=hive;mode=hplsql' --hivevar hivedb=sys --hivevar hivetbl=tbls{noformat} {noformat} 0: jdbc:hive2://localhost> DECLARE hivedb_tbl string; . . . . . . . . . . . . . . . . . . . . . . .> SELECT hivedb || '.' || hivetbl into hivedb_tbl; . . . . . . . . . . . . . . . . . . . . . . .> PRINT hivedb_tbl; . . . . . . . . . . . . . . . . . . . . . . .> / INFO : Compiling command(queryId=hive_20240424145826_617acb79-0b27-46eb-aa05-1332703c94fb): SELECT CONCAT(hivedb, '.', hivetbl) ERROR : FAILED: SemanticException [Error 10004]: Line 1:14 Invalid table alias or column reference 'hivedb': (possible column names are: ) org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:14 Invalid table alias or column reference 'hivedb': (possible column names are: ) INFO : Completed compiling command(queryId=hive_20240424145826_617acb79-0b27-46eb-aa05-1332703c94fb); Time taken: 3.976 seconds ERROR : Unhandled exception in HPL/SQL No rows affected (4.901 seconds) 0: jdbc:hive2://localhost> {noformat} > HPLSQL not using the hive variables passed through beeline using --hivevar > option > - > > Key: HIVE-28214 > URL: https://issues.apache.org/jira/browse/HIVE-28214 > Project: Hive > Issue Type: Bug > Components: hpl/sql >Reporter: Dayakar M >Assignee: Dayakar M >Priority: Major > > HPLSQL not using the hive variables passed through beeline using --hivevar > option. > Steps to reproduce: > {noformat} > beeline -u > 'jdbc:hive2://localhost:1/default;user=hive;password=hive;mode=hplsql' > --hivevar hivedb=sys --hivevar hivetbl=tbls{noformat} > {noformat} > 0: jdbc:hive2://localhost> DECLARE hivedb_tbl string; > . . . . . . . . . . . . . . . . . . . . . . .> SELECT hivedb || '.' || > hivetbl into hivedb_tbl; > . . . . . . . . . . . . . . . . . . . . . . .> PRINT hivedb_tbl; > . . . . . . . . . . . . . . . . . . . . . . .> / > INFO : Compiling > command(queryId=hive_20240424145826_617acb79-0b27-46eb-aa05-1332703c94fb): > SELECT CONCAT(hivedb, '.', hivetbl) ERROR : FAILED: SemanticException [Error > 10004]: Line 1:14 Invalid table alias or column reference 'hivedb': (possible > column names are: ) org.apache.hadoop.hive.ql.parse.SemanticException: Line > 1:14 Invalid table alias or column reference 'hivedb': (possible column names > are: ) > > INFO : Completed compiling > command(queryId=hive_20240424145826_617acb79-0b27-46eb-aa05-1332703c94fb); > Time taken: 3.976 seconds ERROR : Unhandled exception in HPL/SQL No rows > affected (4.901 seconds) > 0: jdbc:hive2://localhost> > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28214) HPLSQL not using the hive variables passed through beeline using --hivevar option
Dayakar M created HIVE-28214: Summary: HPLSQL not using the hive variables passed through beeline using --hivevar option Key: HIVE-28214 URL: https://issues.apache.org/jira/browse/HIVE-28214 Project: Hive Issue Type: Bug Components: hpl/sql Reporter: Dayakar M Assignee: Dayakar M HPLSQL not using the hive variables passed through beeline using --hivevar option. Steps to reproduce: {noformat} beeline -u 'jdbc:hive2://localhost:1/default;user=hive;password=hive;mode=hplsql' --hivevar hivedb=sys --hivevar hivetbl=tbls{noformat} {noformat} 0: jdbc:hive2://localhost> DECLARE hivedb_tbl string; . . . . . . . . . . . . . . . . . . . . . . .> SELECT hivedb || '.' || hivetbl into hivedb_tbl; . . . . . . . . . . . . . . . . . . . . . . .> PRINT hivedb_tbl; . . . . . . . . . . . . . . . . . . . . . . .> / INFO : Compiling command(queryId=hive_20240424145826_617acb79-0b27-46eb-aa05-1332703c94fb): SELECT CONCAT(hivedb, '.', hivetbl) ERROR : FAILED: SemanticException [Error 10004]: Line 1:14 Invalid table alias or column reference 'hivedb': (possible column names are: ) org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:14 Invalid table alias or column reference 'hivedb': (possible column names are: ) INFO : Completed compiling command(queryId=hive_20240424145826_617acb79-0b27-46eb-aa05-1332703c94fb); Time taken: 3.976 seconds ERROR : Unhandled exception in HPL/SQL No rows affected (4.901 seconds) 0: jdbc:hive2://localhost> {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28213) Incorrect results after insert-select from similar bucketed source & target table
[ https://issues.apache.org/jira/browse/HIVE-28213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-28213: -- Description: Insert-select is not honoring bucketing if both source & target are bucketed on same column. eg., {code:java} CREATE EXTERNAL TABLE bucketing_table1 (id INT) CLUSTERED BY (id) SORTED BY (id ASC) INTO 32 BUCKETS stored as textfile; INSERT INTO TABLE bucketing_table1 VALUES (1), (2), (3), (4), (5); CREATE EXTERNAL TABLE bucketing_table2 like bucketing_table1; INSERT INTO TABLE bucketing_table2 select * from bucketing_table1;{code} id=1 => murmur_hash(1) %32 should go to 29th bucket file. bucketing_table1 has id=1 at 29th file, but bucketing_table2 doesn't have 29th file because Insert-select dint honor the bucketing. {code:java} SELECT count(*) FROM bucketing_table1 WHERE id = 1; === 1 //correct result SELECT count(*) FROM bucketing_table2 WHERE id = 1; === 0 // incorrect result select *, INPUT__FILE__NAME from bucketing_table1; +--++ | bucketing_table1.id | input__file__name | +--++ | 2 | /bucketing_table1/04_0 | | 3 | /bucketing_table1/06_0 | | 5 | /bucketing_table1/15_0 | | 4 | /bucketing_table1/21_0 | | 1 | /bucketing_table1/29_0 | +--++ select *, INPUT__FILE__NAME from bucketing_table2; +-++ | bucketing_table2.id | input__file__name | +-++ | 2 | /bucketing_table2/00_0 | | 3 | /bucketing_table2/01_0 | | 5 | /bucketing_table2/02_0 | | 4 | /bucketing_table2/03_0 | | 1 | /bucketing_table2/04_0 | +--++{code} Workaround for read: hive.tez.bucket.pruning=false; PS: Attaching repro file [^test.q] was: Insert-select is not honoring bucketing if both source & target are bucketed on same column. eg., {code:java} CREATE EXTERNAL TABLE bucketing_table1 (id INT) CLUSTERED BY (id) SORTED BY (id ASC) INTO 32 BUCKETS stored as textfile; INSERT INTO TABLE bucketing_table1 VALUES (1), (2), (3), (4), (5); CREATE EXTERNAL TABLE bucketing_table2 like bucketing_table1; INSERT INTO TABLE bucketing_table2 select * from bucketing_table1;{code} id=1 => murmur_hash(1) %32 should go to 29th bucket file. bucketing_table1 has id=1 at 29th file, but bucketing_table2 doesn't have 29th file because Insert-select dint honor the bucketing. {code:java} SELECT count(*) FROM bucketing_table1 WHERE id = 1; === 1 //correct result SELECT count(*) FROM bucketing_table2 WHERE id = 1; === 0 // incorrect result{code} Workaround: hive.tez.bucket.pruning=false; PS: Attaching repro file [^test.q] > Incorrect results after insert-select from similar bucketed source & target > table > - > > Key: HIVE-28213 > URL: https://issues.apache.org/jira/browse/HIVE-28213 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Priority: Major > Attachments: test.q > > > Insert-select is not honoring bucketing if both source & target are bucketed > on same column. > eg., > {code:java} > CREATE EXTERNAL TABLE bucketing_table1 (id INT) > CLUSTERED BY (id) > SORTED BY (id ASC) > INTO 32 BUCKETS stored as textfile; > INSERT INTO TABLE bucketing_table1 VALUES (1), (2), (3), (4), (5); > CREATE EXTERNAL TABLE bucketing_table2 like bucketing_table1; > INSERT INTO TABLE bucketing_table2 select * from bucketing_table1;{code} > id=1 => murmur_hash(1) %32 should go to 29th bucket file. > bucketing_table1 has id=1 at 29th file, > but bucketing_table2 doesn't have 29th file because Insert-select dint honor > the bucketing. > {code:java} > SELECT count(*) FROM bucketing_table1 WHERE id = 1; > === > 1 //correct result > SELECT count(*) FROM bucketing_table2 WHERE id = 1; > === > 0 // incorrect result > select *, INPUT__FILE__NAME from bucketing_table1; > +--++ > | bucketing_table1.id | input__file__name | > +--++ > | 2 | /bucketing_table1/04_0 | > | 3 | /bucketing_table1/06_0 | > | 5 | /bucketing_table1/15_0 | > | 4 | /bucketing_table1/21_0 | > | 1
[jira] [Created] (HIVE-28213) Incorrect results after insert-select from similar bucketed source & target table
Naresh P R created HIVE-28213: - Summary: Incorrect results after insert-select from similar bucketed source & target table Key: HIVE-28213 URL: https://issues.apache.org/jira/browse/HIVE-28213 Project: Hive Issue Type: Bug Reporter: Naresh P R Attachments: test.q Insert-select is not honoring bucketing if both source & target are bucketed on same column. eg., {code:java} CREATE EXTERNAL TABLE bucketing_table1 (id INT) CLUSTERED BY (id) SORTED BY (id ASC) INTO 32 BUCKETS stored as textfile; INSERT INTO TABLE bucketing_table1 VALUES (1), (2), (3), (4), (5); CREATE EXTERNAL TABLE bucketing_table2 like bucketing_table1; INSERT INTO TABLE bucketing_table2 select * from bucketing_table1;{code} id=1 => murmur_hash(1) %32 should go to 29th bucket file. bucketing_table1 has id=1 at 29th file, but bucketing_table2 doesn't have 29th file because Insert-select dint honor the bucketing. {code:java} SELECT count(*) FROM bucketing_table1 WHERE id = 1; === 1 //correct result SELECT count(*) FROM bucketing_table2 WHERE id = 1; === 0 // incorrect result{code} Workaround: hive.tez.bucket.pruning=false; PS: Attaching repro file [^test.q] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28165) HiveSplitGenerator: send splits through filesystem instead of RPC in case of big payload
[ https://issues.apache.org/jira/browse/HIVE-28165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17840436#comment-17840436 ] László Bodor commented on HIVE-28165: - this ticket cannot be merged until TEZ-4556 is resolved (so Tez 0.10.4 is released) on the PR, there is an agreement, which I consider a green path to rebase and merge this once Tez 0.10.4 is released full precommit testing was done with a downstream Hive+Tez, where both HIVE-28165 and TEZ-4548 were present > HiveSplitGenerator: send splits through filesystem instead of RPC in case of > big payload > > > Key: HIVE-28165 > URL: https://issues.apache.org/jira/browse/HIVE-28165 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > > After some investigations regarding hive iceberg issues, it turned out that > in the presence of delete files, the serialized payload might be huge, like > 1-4MB / split, which might lead to extreme memory pressure in the Tez AM, > getting worse when having more and more splits. > Optimizing the payload is always the best option but it's not that obvious: > instead, we should make hive and tez together take care of such situations > without running into OOMs like this below: > {code} > ERROR : FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, > vertexId=vertex_1711290808080__4_00, diagnostics=[Vertex > vertex_1711290808080__4_00 [Map 1] killed/failed due > to:ROOT_INPUT_INIT_FAILURE, Vertex Input: web_sales_1 initializer failed, > vertex=vertex_1711290808080__4_00 [Map 1], java.lang.OutOfMemoryError: > Java heap space > at > com.google.protobuf.ByteString$CodedBuilder.(ByteString.java:907) > at > com.google.protobuf.ByteString$CodedBuilder.(ByteString.java:902) > at com.google.protobuf.ByteString.newCodedBuilder(ByteString.java:898) > at > com.google.protobuf.AbstractMessageLite.toByteString(AbstractMessageLite.java:49) > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.createEventList(HiveSplitGenerator.java:378) > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:337) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager.lambda$runInitializer$3(RootInputInitializerManager.java:199) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$$Lambda$319/0x000840942440.run(Unknown > Source) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializer(RootInputInitializerManager.java:192) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializerAndProcessResult(RootInputInitializerManager.java:173) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager.lambda$createAndStartInitializing$2(RootInputInitializerManager.java:167) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$$Lambda$318/0x000840942040.run(Unknown > Source) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:829) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28212) MiniHS2: use a base folder which is more likely writable on the local FS
[ https://issues.apache.org/jira/browse/HIVE-28212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-28212: Description: we hardcode a HDFS session dir like below: https://github.com/apache/hive/blob/2d855b27d31db6476f18870651db6987816bb5e3/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java#L307 {code} baseFsDir = new Path(new Path(fs.getUri()), "/base"); {code} this can lead to problems with tez local mode with mini hs2, as tez [mirrors|https://github.com/apache/tez/blob/f080031f5c72bc4bfd8090ccdc670bdc0f7fd090/tez-dag/src/main/java/org/apache/tez/client/LocalClient.java#L308-L335] the hdfs contents to a local folder, and later it this leads to a confusing message like: {code} 2024-04-24T02:03:52,101 ERROR [DAGAppMaster Thread] client.LocalClient: Error starting DAGAppMaster java.io.FileNotFoundException: /base/scratch/laszlobodor/_tez_session_dir/b76689bc-d25e-4d65-a339-44206ff57ce2/.tez/application_1713949431891_0001_wd/tez-conf.pb (No such file or directory) at java.io.FileInputStream.open0(Native Method) ~[?:1.8.0_292] at java.io.FileInputStream.open(FileInputStream.java:195) ~[?:1.8.0_292] at java.io.FileInputStream.(FileInputStream.java:138) ~[?:1.8.0_292] at org.apache.tez.common.TezUtilsInternal.readUserSpecifiedTezConfiguration(TezUtilsInternal.java:84) ~[tez-common-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] at org.apache.tez.client.LocalClient.createDAGAppMaster(LocalClient.java:394) ~[tez-dag-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] at org.apache.tez.client.LocalClient$1.run(LocalClient.java:357) [tez-dag-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292] {code} btw, this confusing message will be fixed in TEZ-4555, but we need to give something different than /base it doesn't make sense to hack a different folder in tez for the local mode, instead we should change the hardcoded "/base" in MiniHS2 which might be more durable and solves the abovementioned problem currently, hive's default scratch dir is [/tmp/hive|https://github.com/apache/hive/blob/2d855b27d31db6476f18870651db6987816bb5e3/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L498] was: we hardcode a HDFS session dir like below: https://github.com/apache/hive/blob/2d855b27d31db6476f18870651db6987816bb5e3/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java#L307 {code} baseFsDir = new Path(new Path(fs.getUri()), "/base"); {code} this can lead to problems with tez local mode with mini hs2, as tez mirrors the hdfs contents to a local folder, and later it this leads to a confusing message like: {code} 2024-04-24T02:03:52,101 ERROR [DAGAppMaster Thread] client.LocalClient: Error starting DAGAppMaster java.io.FileNotFoundException: /base/scratch/laszlobodor/_tez_session_dir/b76689bc-d25e-4d65-a339-44206ff57ce2/.tez/application_1713949431891_0001_wd/tez-conf.pb (No such file or directory) at java.io.FileInputStream.open0(Native Method) ~[?:1.8.0_292] at java.io.FileInputStream.open(FileInputStream.java:195) ~[?:1.8.0_292] at java.io.FileInputStream.(FileInputStream.java:138) ~[?:1.8.0_292] at org.apache.tez.common.TezUtilsInternal.readUserSpecifiedTezConfiguration(TezUtilsInternal.java:84) ~[tez-common-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] at org.apache.tez.client.LocalClient.createDAGAppMaster(LocalClient.java:394) ~[tez-dag-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] at org.apache.tez.client.LocalClient$1.run(LocalClient.java:357) [tez-dag-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292] {code} btw, this confusing message will be fixed in TEZ-4555, but we need to give something different than /base it doesn't make sense to hack a different folder in tez for the local mode, instead we should change the hardcoded "/base" in MiniHS2 which might be more durable and solves the abovementioned problem currently, hive's default scratch dir is [/tmp/hive|https://github.com/apache/hive/blob/2d855b27d31db6476f18870651db6987816bb5e3/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L498] > MiniHS2: use a base folder which is more likely writable on the local FS > > > Key: HIVE-28212 > URL: https://issues.apache.org/jira/browse/HIVE-28212 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > > we hardcode a HDFS session dir like below: > https://github.com/apache/hive/blob/2d855b27d31db6476f18870651db6987816bb5e3/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java#
[jira] [Resolved] (HIVE-27568) Implement RegisterTableProcedure for Iceberg Tables
[ https://issues.apache.org/jira/browse/HIVE-27568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena resolved HIVE-27568. - Fix Version/s: Not Applicable Resolution: Information Provided We already have one: https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java#L291 > Implement RegisterTableProcedure for Iceberg Tables > --- > > Key: HIVE-27568 > URL: https://issues.apache.org/jira/browse/HIVE-27568 > Project: Hive > Issue Type: Improvement >Reporter: Manish Maheshwari >Priority: Major > Labels: iceberg > Fix For: Not Applicable > > > Implement RegisterTableProcedure for registering exising iceberg tables into > the catalog > [https://github.com/apache/iceberg/blob/master/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/procedures/RegisterTableProcedure.java] > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27900) hive can not read iceberg-parquet table
[ https://issues.apache.org/jira/browse/HIVE-27900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena resolved HIVE-27900. - Fix Version/s: Not Applicable Resolution: Not A Problem > hive can not read iceberg-parquet table > --- > > Key: HIVE-27900 > URL: https://issues.apache.org/jira/browse/HIVE-27900 > Project: Hive > Issue Type: Bug > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Minor > Fix For: Not Applicable > > > We found that using HIVE4-BETA version, we could not query the > Iceberg-Parquet table with vectorised execution turned on. > {code:java} > --spark-sql(3.4.1+iceberg 1.4.2) > CREATE TABLE local.test.b_qqd_shop_rfm_parquet_snappy ( > a string,b string,c string) > USING iceberg > LOCATION '/iceberg-catalog/warehouse/test/b_qqd_shop_rfm_parquet_snappy' > TBLPROPERTIES ( > 'current-snapshot-id' = '5138351937447353683', > 'format' = 'iceberg/parquet', > 'format-version' = '2', > 'read.orc.vectorization.enabled' = 'true', > 'write.format.default' = 'parquet', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.parquet.compression-codec' = 'snappy'); > --hive-sql > CREATE EXTERNAL TABLE iceberg_dwd.b_qqd_shop_rfm_parquet_snappy > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION > 'hdfs://xxx/iceberg-catalog/warehouse/test/b_qqd_shop_rfm_parquet_snappy/' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > set hive.default.fileformat=orc; > set hive.default.fileformat.managed=orc; > create table test_parquet_as_orc as select * from > b_qqd_shop_rfm_parquet_snappy limit 100; > , TaskAttempt 2 failed, info=[Error: Node: /xxx..xx.xx: Error while > running task ( failure ) : > attempt_1696729618575_69586_1_00_00_2:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293) > ... 16 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:993) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101) > ... 19 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.vector.reducesink.VectorReduceSinkEmptyKeyOperator.process(VectorReduceSinkEmptyKeyOperator.java:137) > at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919) > at > org.apache.hadoop.hive.ql.exec.vector.VectorS
[jira] [Resolved] (HIVE-25041) During "schematool --verbose -dbType derby -initSchema" I'm getting "utocommit on" (with a missing ''a").
[ https://issues.apache.org/jira/browse/HIVE-25041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena resolved HIVE-25041. - Fix Version/s: Not Applicable Resolution: Not A Bug Jira is for reporting bugs, for user level issues reach out the hive user mailing lists for help > During "schematool --verbose -dbType derby -initSchema" I'm getting > "utocommit on" (with a missing ''a"). > - > > Key: HIVE-25041 > URL: https://issues.apache.org/jira/browse/HIVE-25041 > Project: Hive > Issue Type: Bug > Components: Database/Schema, Hive >Affects Versions: 3.1.2 >Reporter: NOELLE MILTON VEGA >Priority: Blocker > Fix For: Not Applicable > > > Hello Friends: > I'm issuing the below command, but am getting the exception shown. This is a > *pseudo-distributed* mode setup of *HIVE* and *HADOOP* (simple), so I've > edited a tiny few files (just following the vanilla instructions – nothing > fancy). > Yet somewhere it looks like there's a typo, perhaps in this file: > > {noformat} > hive-schema-3.1.0.derby.sql{noformat} > > From the below, {color:#0747a6}*utocommit on*{color} looks like it should be > {color:#0747a6}*autocommit on*{color}. > {code:java} > jdoe@fedora-33$ ${HIVE_HOME}/bin/schematool --verbose -dbType derby > -initSchema > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/opt/hadoop/hadoop.d/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/opt/hadoop/hive.d/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > 2021-04-20 21:09:57,605 INFO [main] conf.HiveConf > (HiveConf.java:findConfigFile(187)) - Found configuration file > file:/opt/hadoop/hive.d/conf/hive-site.xml > 2021-04-20 21:09:58,013 INFO [main] tools.HiveSchemaHelper > (HiveSchemaHelper.java:logAndPrintToStdout(117)) - Metastore connection URL: > jdbc:derby:;databaseName=metastore_db;create=true > Metastore connection URL: jdbc:derby:;databaseName=metastore_db;create=true > 2021-04-20 21:09:58,014 INFO [main] tools.HiveSchemaHelper > (HiveSchemaHelper.java:logAndPrintToStdout(117)) - Metastore Connection > Driver : org.apache.derby.jdbc.EmbeddedDriver > Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver > 2021-04-20 21:09:58,014 INFO [main] tools.HiveSchemaHelper > (HiveSchemaHelper.java:logAndPrintToStdout(117)) - Metastore connection User: > APP > Metastore connection User: APP > Starting metastore schema initialization to 3.1.0 > Initialization script hive-schema-3.1.0.derby.sql > Connecting to jdbc:derby:;databaseName=metastore_db;create=true > Connected to: Apache Derby (version 10.14.1.0 - (1808820)) > Driver: Apache Derby Embedded JDBC Driver (version 10.14.1.0 - (1808820)) > Transaction isolation: TRANSACTION_READ_COMMITTED > 0: jdbc:derby:> utocommit on > Error: Syntax error: Encountered "utocommit" at line 1, column 1. > (state=42X01,code=3) > Closing: 0: jdbc:derby:;databaseName=metastore_db;create=true > org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization > FAILED! Metastore state would be inconsistent !! > Underlying cause: java.io.IOException : Schema script failed, errorcode 2 > org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization > FAILED! Metastore state would be inconsistent !! > at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:594) > at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:567) > at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:1517) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at org.apache.hadoop.util.RunJar.run(RunJar.java:323) > at org.apache.hadoop.util.RunJar.main(RunJar.java:236) > Caused by: java.io.IOException: Schema script failed, errorcode 2 > at > org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:1226) > at > org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:1204) > at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:590) > ... 8 more > *** schemaTool failed ***{code} > > Versions are: > {code:java} > Hive..: v3.1.2 > Hadoop: v3.3.0{code} > Any ideas? Thank you. -- This message was
[jira] [Updated] (HIVE-28212) MiniHS2: use a base folder which is more likely writable on the local FS
[ https://issues.apache.org/jira/browse/HIVE-28212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-28212: Description: we hardcode a HDFS session dir like below: https://github.com/apache/hive/blob/2d855b27d31db6476f18870651db6987816bb5e3/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java#L307 {code} baseFsDir = new Path(new Path(fs.getUri()), "/base"); {code} this can lead to problems with tez local mode with mini hs2, as tez mirrors the hdfs contents to a local folder, and later it this leads to a confusing message like: {code} 2024-04-24T02:03:52,101 ERROR [DAGAppMaster Thread] client.LocalClient: Error starting DAGAppMaster java.io.FileNotFoundException: /base/scratch/laszlobodor/_tez_session_dir/b76689bc-d25e-4d65-a339-44206ff57ce2/.tez/application_1713949431891_0001_wd/tez-conf.pb (No such file or directory) at java.io.FileInputStream.open0(Native Method) ~[?:1.8.0_292] at java.io.FileInputStream.open(FileInputStream.java:195) ~[?:1.8.0_292] at java.io.FileInputStream.(FileInputStream.java:138) ~[?:1.8.0_292] at org.apache.tez.common.TezUtilsInternal.readUserSpecifiedTezConfiguration(TezUtilsInternal.java:84) ~[tez-common-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] at org.apache.tez.client.LocalClient.createDAGAppMaster(LocalClient.java:394) ~[tez-dag-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] at org.apache.tez.client.LocalClient$1.run(LocalClient.java:357) [tez-dag-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292] {code} btw, this confusing message will be fixed in TEZ-4555, but we need to give something different than /base it doesn't make sense to hack a different folder in tez for the local mode, instead we should change the hardcoded "/base" in MiniHS2 which might be more durable and solves the abovementioned problem currently, hive's default scratch dir is [/tmp/hive|https://github.com/apache/hive/blob/2d855b27d31db6476f18870651db6987816bb5e3/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L498] was: we hardcode a HDFS session dir like below: https://github.com/apache/hive/blob/2d855b27d31db6476f18870651db6987816bb5e3/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java#L307 {code} baseFsDir = new Path(new Path(fs.getUri()), "/base"); {code} this can lead to problems with tez local mode with mini hs2, as tez mirrors the hdfs contents to a local folder, and later it this leads to a confusing message like: {code} 2024-04-24T02:03:52,101 ERROR [DAGAppMaster Thread] client.LocalClient: Error starting DAGAppMaster java.io.FileNotFoundException: /base/scratch/laszlobodor/_tez_session_dir/b76689bc-d25e-4d65-a339-44206ff57ce2/.tez/application_1713949431891_0001_wd/tez-conf.pb (No such file or directory) at java.io.FileInputStream.open0(Native Method) ~[?:1.8.0_292] at java.io.FileInputStream.open(FileInputStream.java:195) ~[?:1.8.0_292] at java.io.FileInputStream.(FileInputStream.java:138) ~[?:1.8.0_292] at org.apache.tez.common.TezUtilsInternal.readUserSpecifiedTezConfiguration(TezUtilsInternal.java:84) ~[tez-common-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] at org.apache.tez.client.LocalClient.createDAGAppMaster(LocalClient.java:394) ~[tez-dag-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] at org.apache.tez.client.LocalClient$1.run(LocalClient.java:357) [tez-dag-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292] {code} btw, this confusing message will be fixed in TEZ-4555, but we need to give something different than /base currently, hive's default scratch dir is [/tmp/hive|https://github.com/apache/hive/blob/2d855b27d31db6476f18870651db6987816bb5e3/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L498] > MiniHS2: use a base folder which is more likely writable on the local FS > > > Key: HIVE-28212 > URL: https://issues.apache.org/jira/browse/HIVE-28212 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > > we hardcode a HDFS session dir like below: > https://github.com/apache/hive/blob/2d855b27d31db6476f18870651db6987816bb5e3/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java#L307 > {code} > baseFsDir = new Path(new Path(fs.getUri()), "/base"); > {code} > this can lead to problems with tez local mode with mini hs2, as tez mirrors > the hdfs contents to a local folder, and later it this leads to a confusing > message like: > {code} > 2024-04-24T02:03:52,101 ERROR [DAGAppMaster Thread] client.LocalClient: Error > st
[jira] [Work started] (HIVE-28212) MiniHS2: use a base folder which is more likely writable on the local FS
[ https://issues.apache.org/jira/browse/HIVE-28212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-28212 started by László Bodor. --- > MiniHS2: use a base folder which is more likely writable on the local FS > > > Key: HIVE-28212 > URL: https://issues.apache.org/jira/browse/HIVE-28212 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > > we hardcode a HDFS session dir like below: > https://github.com/apache/hive/blob/2d855b27d31db6476f18870651db6987816bb5e3/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java#L307 > {code} > baseFsDir = new Path(new Path(fs.getUri()), "/base"); > {code} > this can lead to problems with tez local mode with mini hs2, as tez mirrors > the hdfs contents to a local folder, and later it this leads to a confusing > message like: > {code} > 2024-04-24T02:03:52,101 ERROR [DAGAppMaster Thread] client.LocalClient: Error > starting DAGAppMaster > java.io.FileNotFoundException: > /base/scratch/laszlobodor/_tez_session_dir/b76689bc-d25e-4d65-a339-44206ff57ce2/.tez/application_1713949431891_0001_wd/tez-conf.pb > (No such file or directory) > at java.io.FileInputStream.open0(Native Method) ~[?:1.8.0_292] > at java.io.FileInputStream.open(FileInputStream.java:195) ~[?:1.8.0_292] > at java.io.FileInputStream.(FileInputStream.java:138) > ~[?:1.8.0_292] > at > org.apache.tez.common.TezUtilsInternal.readUserSpecifiedTezConfiguration(TezUtilsInternal.java:84) > ~[tez-common-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] > at > org.apache.tez.client.LocalClient.createDAGAppMaster(LocalClient.java:394) > ~[tez-dag-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] > at org.apache.tez.client.LocalClient$1.run(LocalClient.java:357) > [tez-dag-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292] > {code} > btw, this confusing message will be fixed in TEZ-4555, but we need to give > something different than /base > currently, hive's default scratch dir is > [/tmp/hive|https://github.com/apache/hive/blob/2d855b27d31db6476f18870651db6987816bb5e3/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L498] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28212) MiniHS2: use a base folder which is more likely writable on the local FS
[ https://issues.apache.org/jira/browse/HIVE-28212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28212: -- Labels: pull-request-available (was: ) > MiniHS2: use a base folder which is more likely writable on the local FS > > > Key: HIVE-28212 > URL: https://issues.apache.org/jira/browse/HIVE-28212 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > > we hardcode a HDFS session dir like below: > https://github.com/apache/hive/blob/2d855b27d31db6476f18870651db6987816bb5e3/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java#L307 > {code} > baseFsDir = new Path(new Path(fs.getUri()), "/base"); > {code} > this can lead to problems with tez local mode with mini hs2, as tez mirrors > the hdfs contents to a local folder, and later it this leads to a confusing > message like: > {code} > 2024-04-24T02:03:52,101 ERROR [DAGAppMaster Thread] client.LocalClient: Error > starting DAGAppMaster > java.io.FileNotFoundException: > /base/scratch/laszlobodor/_tez_session_dir/b76689bc-d25e-4d65-a339-44206ff57ce2/.tez/application_1713949431891_0001_wd/tez-conf.pb > (No such file or directory) > at java.io.FileInputStream.open0(Native Method) ~[?:1.8.0_292] > at java.io.FileInputStream.open(FileInputStream.java:195) ~[?:1.8.0_292] > at java.io.FileInputStream.(FileInputStream.java:138) > ~[?:1.8.0_292] > at > org.apache.tez.common.TezUtilsInternal.readUserSpecifiedTezConfiguration(TezUtilsInternal.java:84) > ~[tez-common-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] > at > org.apache.tez.client.LocalClient.createDAGAppMaster(LocalClient.java:394) > ~[tez-dag-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] > at org.apache.tez.client.LocalClient$1.run(LocalClient.java:357) > [tez-dag-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292] > {code} > btw, this confusing message will be fixed in TEZ-4555, but we need to give > something different than /base > currently, hive's default scratch dir is > [/tmp/hive|https://github.com/apache/hive/blob/2d855b27d31db6476f18870651db6987816bb5e3/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L498] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28212) MiniHS2: use a base folder which is more likely writable on the local FS
[ https://issues.apache.org/jira/browse/HIVE-28212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-28212: Description: we hardcode a HDFS session dir like below: https://github.com/apache/hive/blob/2d855b27d31db6476f18870651db6987816bb5e3/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java#L307 {code} baseFsDir = new Path(new Path(fs.getUri()), "/base"); {code} this can lead to problems with tez local mode with mini hs2, as tez mirrors the hdfs contents to a local folder, and later it this leads to a confusing message like: {code} 2024-04-24T02:03:52,101 ERROR [DAGAppMaster Thread] client.LocalClient: Error starting DAGAppMaster java.io.FileNotFoundException: /base/scratch/laszlobodor/_tez_session_dir/b76689bc-d25e-4d65-a339-44206ff57ce2/.tez/application_1713949431891_0001_wd/tez-conf.pb (No such file or directory) at java.io.FileInputStream.open0(Native Method) ~[?:1.8.0_292] at java.io.FileInputStream.open(FileInputStream.java:195) ~[?:1.8.0_292] at java.io.FileInputStream.(FileInputStream.java:138) ~[?:1.8.0_292] at org.apache.tez.common.TezUtilsInternal.readUserSpecifiedTezConfiguration(TezUtilsInternal.java:84) ~[tez-common-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] at org.apache.tez.client.LocalClient.createDAGAppMaster(LocalClient.java:394) ~[tez-dag-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] at org.apache.tez.client.LocalClient$1.run(LocalClient.java:357) [tez-dag-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292] {code} btw, this confusing message will be fixed in TEZ-4555, but we need to give something different than /base currently, hive's default scratch dir is [/tmp/hive|https://github.com/apache/hive/blob/2d855b27d31db6476f18870651db6987816bb5e3/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L498] was: we hardcode a HDFS session dir like below: https://github.com/apache/hive/blob/2d855b27d31db6476f18870651db6987816bb5e3/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java#L307 {code} baseFsDir = new Path(new Path(fs.getUri()), "/base"); {code} this can lead to problems with tez local mode with mini hs2, as tez mirrors the hdfs contents to a local folder, and later it this leads to a confusing message like: {code} 2024-04-24T02:03:52,101 ERROR [DAGAppMaster Thread] client.LocalClient: Error starting DAGAppMaster java.io.FileNotFoundException: /base/scratch/laszlobodor/_tez_session_dir/b76689bc-d25e-4d65-a339-44206ff57ce2/.tez/application_1713949431891_0001_wd/tez-conf.pb (No such file or directory) at java.io.FileInputStream.open0(Native Method) ~[?:1.8.0_292] at java.io.FileInputStream.open(FileInputStream.java:195) ~[?:1.8.0_292] at java.io.FileInputStream.(FileInputStream.java:138) ~[?:1.8.0_292] at org.apache.tez.common.TezUtilsInternal.readUserSpecifiedTezConfiguration(TezUtilsInternal.java:84) ~[tez-common-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] at org.apache.tez.client.LocalClient.createDAGAppMaster(LocalClient.java:394) ~[tez-dag-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] at org.apache.tez.client.LocalClient$1.run(LocalClient.java:357) [tez-dag-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292] {code} brw > MiniHS2: use a base folder which is more likely writable on the local FS > > > Key: HIVE-28212 > URL: https://issues.apache.org/jira/browse/HIVE-28212 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > > we hardcode a HDFS session dir like below: > https://github.com/apache/hive/blob/2d855b27d31db6476f18870651db6987816bb5e3/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java#L307 > {code} > baseFsDir = new Path(new Path(fs.getUri()), "/base"); > {code} > this can lead to problems with tez local mode with mini hs2, as tez mirrors > the hdfs contents to a local folder, and later it this leads to a confusing > message like: > {code} > 2024-04-24T02:03:52,101 ERROR [DAGAppMaster Thread] client.LocalClient: Error > starting DAGAppMaster > java.io.FileNotFoundException: > /base/scratch/laszlobodor/_tez_session_dir/b76689bc-d25e-4d65-a339-44206ff57ce2/.tez/application_1713949431891_0001_wd/tez-conf.pb > (No such file or directory) > at java.io.FileInputStream.open0(Native Method) ~[?:1.8.0_292] > at java.io.FileInputStream.open(FileInputStream.java:195) ~[?:1.8.0_292] > at java.io.FileInputStream.(FileInputStream.java:138) > ~[?:1.8.0_292] > at > org.apache.tez.common.TezUtilsInternal.readUserSpecifiedTezConfiguration(TezUtilsI
[jira] [Updated] (HIVE-28212) MiniHS2: use a base folder which is more likely writable on the local FS
[ https://issues.apache.org/jira/browse/HIVE-28212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-28212: Summary: MiniHS2: use a base folder which is more likely writable on the local FS (was: MiniHS2: use a base folder which is more likely to be writable on the local FS) > MiniHS2: use a base folder which is more likely writable on the local FS > > > Key: HIVE-28212 > URL: https://issues.apache.org/jira/browse/HIVE-28212 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > > we hardcode a HDFS session dir like below: > https://github.com/apache/hive/blob/2d855b27d31db6476f18870651db6987816bb5e3/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java#L307 > {code} > baseFsDir = new Path(new Path(fs.getUri()), "/base"); > {code} > this can lead to problems with tez local mode with mini hs2, as tez mirrors > the hdfs contents to a local folder, and later it this leads to a confusing > message like: > {code} > 2024-04-24T02:03:52,101 ERROR [DAGAppMaster Thread] client.LocalClient: Error > starting DAGAppMaster > java.io.FileNotFoundException: > /base/scratch/laszlobodor/_tez_session_dir/b76689bc-d25e-4d65-a339-44206ff57ce2/.tez/application_1713949431891_0001_wd/tez-conf.pb > (No such file or directory) > at java.io.FileInputStream.open0(Native Method) ~[?:1.8.0_292] > at java.io.FileInputStream.open(FileInputStream.java:195) ~[?:1.8.0_292] > at java.io.FileInputStream.(FileInputStream.java:138) > ~[?:1.8.0_292] > at > org.apache.tez.common.TezUtilsInternal.readUserSpecifiedTezConfiguration(TezUtilsInternal.java:84) > ~[tez-common-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] > at > org.apache.tez.client.LocalClient.createDAGAppMaster(LocalClient.java:394) > ~[tez-dag-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] > at org.apache.tez.client.LocalClient$1.run(LocalClient.java:357) > [tez-dag-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292] > {code} > brw -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28212) MiniHS2: use a base folder which is more likely to be writable on the local FS
[ https://issues.apache.org/jira/browse/HIVE-28212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-28212: Description: we hardcode a HDFS session dir like below: https://github.com/apache/hive/blob/2d855b27d31db6476f18870651db6987816bb5e3/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java#L307 {code} baseFsDir = new Path(new Path(fs.getUri()), "/base"); {code} this can lead to problems with tez local mode with mini hs2, as tez mirrors the hdfs contents to a local folder, and later it this leads to a confusing message like: {code} 2024-04-24T02:03:52,101 ERROR [DAGAppMaster Thread] client.LocalClient: Error starting DAGAppMaster java.io.FileNotFoundException: /base/scratch/laszlobodor/_tez_session_dir/b76689bc-d25e-4d65-a339-44206ff57ce2/.tez/application_1713949431891_0001_wd/tez-conf.pb (No such file or directory) at java.io.FileInputStream.open0(Native Method) ~[?:1.8.0_292] at java.io.FileInputStream.open(FileInputStream.java:195) ~[?:1.8.0_292] at java.io.FileInputStream.(FileInputStream.java:138) ~[?:1.8.0_292] at org.apache.tez.common.TezUtilsInternal.readUserSpecifiedTezConfiguration(TezUtilsInternal.java:84) ~[tez-common-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] at org.apache.tez.client.LocalClient.createDAGAppMaster(LocalClient.java:394) ~[tez-dag-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] at org.apache.tez.client.LocalClient$1.run(LocalClient.java:357) [tez-dag-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292] {code} brw > MiniHS2: use a base folder which is more likely to be writable on the local FS > -- > > Key: HIVE-28212 > URL: https://issues.apache.org/jira/browse/HIVE-28212 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > > we hardcode a HDFS session dir like below: > https://github.com/apache/hive/blob/2d855b27d31db6476f18870651db6987816bb5e3/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java#L307 > {code} > baseFsDir = new Path(new Path(fs.getUri()), "/base"); > {code} > this can lead to problems with tez local mode with mini hs2, as tez mirrors > the hdfs contents to a local folder, and later it this leads to a confusing > message like: > {code} > 2024-04-24T02:03:52,101 ERROR [DAGAppMaster Thread] client.LocalClient: Error > starting DAGAppMaster > java.io.FileNotFoundException: > /base/scratch/laszlobodor/_tez_session_dir/b76689bc-d25e-4d65-a339-44206ff57ce2/.tez/application_1713949431891_0001_wd/tez-conf.pb > (No such file or directory) > at java.io.FileInputStream.open0(Native Method) ~[?:1.8.0_292] > at java.io.FileInputStream.open(FileInputStream.java:195) ~[?:1.8.0_292] > at java.io.FileInputStream.(FileInputStream.java:138) > ~[?:1.8.0_292] > at > org.apache.tez.common.TezUtilsInternal.readUserSpecifiedTezConfiguration(TezUtilsInternal.java:84) > ~[tez-common-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] > at > org.apache.tez.client.LocalClient.createDAGAppMaster(LocalClient.java:394) > ~[tez-dag-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] > at org.apache.tez.client.LocalClient$1.run(LocalClient.java:357) > [tez-dag-0.9.1.2024.0.19.0-3.jar:0.9.1.2024.0.19.0-3] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292] > {code} > brw -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28212) MiniHS2: use a base folder which is more likely to be writable on the local FS
László Bodor created HIVE-28212: --- Summary: MiniHS2: use a base folder which is more likely to be writable on the local FS Key: HIVE-28212 URL: https://issues.apache.org/jira/browse/HIVE-28212 Project: Hive Issue Type: Improvement Reporter: László Bodor -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-28212) MiniHS2: use a base folder which is more likely to be writable on the local FS
[ https://issues.apache.org/jira/browse/HIVE-28212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor reassigned HIVE-28212: --- Assignee: László Bodor > MiniHS2: use a base folder which is more likely to be writable on the local FS > -- > > Key: HIVE-28212 > URL: https://issues.apache.org/jira/browse/HIVE-28212 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-28160) Improve LICENSE for jquery and glyphicons-halflings fonts/icons
[ https://issues.apache.org/jira/browse/HIVE-28160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis resolved HIVE-28160. Fix Version/s: 4.1.0 Resolution: Fixed Fixed in [https://github.com/apache/hive/commit/5f9cb486e4de27129ab0428a71e9cbb0d49fd087]. Thanks [~dengzh] for the review! > Improve LICENSE for jquery and glyphicons-halflings fonts/icons > --- > > Key: HIVE-28160 > URL: https://issues.apache.org/jira/browse/HIVE-28160 > Project: Hive > Issue Type: Task > Components: Documentation >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Fix For: 4.1.0 > > > The following files do not have an ASF header and it unclear to reason about > the license that applies to them and if it complies to the ASF policy. > * jquery.min.js (missing from LICENSE) > * jquery.sparkline.min.js (typo in LICENSE) > * glyphicons-halflings-regular.svg > * glyphicons-halflings-regular.woff > * glyphicons-halflings-regular.ttf > * glyphicons-halflings-regular.eot > The [Glyphicons|https://glyphicons.com/sets/halflings/] are available under > multiple licenses including commercial ones so additional clarifications are > needed to ensure that the ones Hive is using are compatible with the AL2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-28159) Remove copyright notice from ASF headers
[ https://issues.apache.org/jira/browse/HIVE-28159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis resolved HIVE-28159. Fix Version/s: 4.1.0 Resolution: Fixed Fixed in [https://github.com/apache/hive/commit/5f9cb486e4de27129ab0428a71e9cbb0d49fd087]. Thanks [~dengzh] for the review! > Remove copyright notice from ASF headers > > > Key: HIVE-28159 > URL: https://issues.apache.org/jira/browse/HIVE-28159 > Project: Hive > Issue Type: Task > Components: Documentation >Affects Versions: 4.0.0-beta-1 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Fix For: 4.1.0 > > > Currently a few source files in the repo which have an ASF header contain > also a copyright notice mentioning the ASF. > There are various entries similar to the one below. > {noformat} > Copyright The Apache Software Foundation. > {noformat} > The [ASF policy|https://www.apache.org/legal/src-headers.html#headers] > advises against this practice. The ASF header should not contain a copyright > notice. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-28158) Add ASF license header in non-java files
[ https://issues.apache.org/jira/browse/HIVE-28158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis resolved HIVE-28158. Fix Version/s: 4.1.0 Resolution: Fixed Fixed in [https://github.com/apache/hive/commit/5f9cb486e4de27129ab0428a71e9cbb0d49fd087]. Thanks [~dengzh] for the review! > Add ASF license header in non-java files > > > Key: HIVE-28158 > URL: https://issues.apache.org/jira/browse/HIVE-28158 > Project: Hive > Issue Type: Task > Components: Documentation >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Fix For: 4.1.0 > > > According to the a [ASF policy|https://www.apache.org/legal/src-headers.html] > all source files should contain an ASF header. Currently there are a lot of > source files that do not contain the ASF header. The files can be broken into > the following categories: > *Must have:* > * Python files (.py) > * Bash/Shell script files (.sh) > * Javascript files (.js) > *Should have:* > * Maven files (pom.xml) > * GitHub workflows and Docker files (.yml) > *Good to have:* > * Hive/Tez/Yarn and other configuration files (.xml) > * Log4J property files (.properties) > * Markdown files (.md) > *Could have but OK if they don't:* > * Data files for tests (data/files/**) > * Generated code files (src/gen) > * QTest input/output files (.q, .q.out) > * IntelliJ files (.idea) > * Other txt and data files > The changes here aim to address the first three categories (must, should, > good) and add the missing header when possible. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-28155) StringToDouble.java violates the ASF 3rd party license guidelines
[ https://issues.apache.org/jira/browse/HIVE-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis resolved HIVE-28155. Fix Version/s: 4.1.0 Resolution: Fixed Fixed in [https://github.com/apache/hive/commit/5f9cb486e4de27129ab0428a71e9cbb0d49fd087]. Thanks [~dengzh] for the review! > StringToDouble.java violates the ASF 3rd party license guidelines > - > > Key: HIVE-28155 > URL: https://issues.apache.org/jira/browse/HIVE-28155 > Project: Hive > Issue Type: Task > Components: Serializers/Deserializers >Affects Versions: 4.0.0-beta-1 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Minor > Labels: pull-request-available > Fix For: 4.1.0 > > > The [StringToDouble.java > file|https://github.com/apache/hive/blob/c26c25df5963108cd3c4921675e4b67a7f0401fd/serde/src/java/org/apache/hadoop/hive/serde2/lazy/fast/StringToDouble.java] > violates the [ASF 3-party work > guidelines|https://www.apache.org/legal/src-headers.html#3party]. > The file must not have an ASF header and the associated license for this work > must be part of the distribution. > The file was introduced by HIVE-15743 and according to the comments under the > respective ticket it is a port of strtod C procedure from a BSD distribution > to Java. However, the exact provenance of the file remains unknown and so > does the original license. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-28157) Drop unused Arcanist configuration file
[ https://issues.apache.org/jira/browse/HIVE-28157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis resolved HIVE-28157. Fix Version/s: 4.1.0 Resolution: Fixed Fixed in [https://github.com/apache/hive/commit/5f9cb486e4de27129ab0428a71e9cbb0d49fd087]. Thanks [~dengzh] for the review! > Drop unused Arcanist configuration file > --- > > Key: HIVE-28157 > URL: https://issues.apache.org/jira/browse/HIVE-28157 > Project: Hive > Issue Type: Task >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Fix For: 4.1.0 > > > The > [Arcanist|https://secure.phabricator.com/book/phabricator/article/arcanist/] > configuration file > [.arcconfig|https://github.com/apache/hive/blob/8b287869bd75665cdcf7f59b64389b6e33cfbab8/.arcconfig] > was last modified in 2011 (HIVE-2588) and it has not been used for a very > long time. > The configuration points to Facebook instance of the Phabricator > (https://reviews.facebook.net/) that is no longer in service. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28207) NullPointerException is thrown when checking column uniqueness
[ https://issues.apache.org/jira/browse/HIVE-28207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-28207: --- Fix Version/s: 4.1.0 Resolution: Fixed Status: Resolved (was: Patch Available) Fixed in [https://github.com/apache/hive/commit/2d855b27d31db6476f18870651db6987816bb5e3.] Thanks for the PR [~okumin] and [~ayushsaxena] for the review! > NullPointerException is thrown when checking column uniqueness > -- > > Key: HIVE-28207 > URL: https://issues.apache.org/jira/browse/HIVE-28207 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0 >Reporter: Shohei Okumiya >Assignee: Shohei Okumiya >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > > In some cases, we skip checking null. For example, the last statement in the > following set of queries fails with NPE. > {code:java} > CREATE TABLE `store_sales` (`ss_item_sk` bigint); > CREATE TABLE `household_demographics` (`hd_demo_sk` bigint); > CREATE TABLE `item` (`i_item_sk` bigint); > ALTER TABLE `store_sales` ADD CONSTRAINT `pk_ss` PRIMARY KEY (`ss_item_sk`) > DISABLE NOVALIDATE RELY; > ALTER TABLE `item` ADD CONSTRAINT `pk_i` PRIMARY KEY (`i_item_sk`) DISABLE > NOVALIDATE RELY; > ALTER TABLE `store_sales` ADD CONSTRAINT `ss_i` FOREIGN KEY (`ss_item_sk`) > REFERENCES `item`(`i_item_sk`) DISABLE NOVALIDATE RELY; > EXPLAIN > SELECT i_item_sk > FROM store_sales, household_demographics, item > WHERE ss_item_sk = i_item_sk{code} > The NPE happens with HiveJoinConstraintsRule in the above case. > {code:java} > org.apache.hive.service.cli.HiveSQLException: Error while compiling > statement: FAILED: NullPointerException null > at > org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:376) > ~[hive-service-4.0.0.jar:4.0.0] > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:214) > ~[hive-service-4.0.0.jar:4.0.0] > at > org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:270) > ~[hive-service-4.0.0.jar:4.0.0] > at > org.apache.hive.service.cli.operation.Operation.run(Operation.java:286) > ~[hive-service-4.0.0.jar:4.0.0] > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:557) > ~[hive-service-4.0.0.jar:4.0.0] > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:542) > ~[hive-service-4.0.0.jar:4.0.0] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_275] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_275] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_275] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_275] > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) > ~[hive-service-4.0.0.jar:4.0.0] > at > org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) > ~[hive-service-4.0.0.jar:4.0.0] > at > org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) > ~[hive-service-4.0.0.jar:4.0.0] > at java.security.AccessController.doPrivileged(Native Method) > ~[?:1.8.0_275] > at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_275] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > ~[hadoop-common-3.3.6.jar:?] > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) > ~[hive-service-4.0.0.jar:4.0.0] > at com.sun.proxy.$Proxy42.executeStatementAsync(Unknown Source) ~[?:?] > at > org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:316) > ~[hive-service-4.0.0.jar:4.0.0] > at > org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:652) > ~[hive-service-4.0.0.jar:4.0.0] > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1670) > ~[hive-exec-4.0.0.jar:4.0.0] > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1650) > ~[hive-exec-4.0.0.jar:4.0.0] > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) > ~[hive-exec-4.0.0.jar:4.0.0] > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) > ~[hive-exec-4.0.0.jar:4.0.0] > at > org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) > ~[hive-service-4.0.0.jar:4.0.0] >
[jira] [Updated] (HIVE-28211) Restore hive-exec-core jar
[ https://issues.apache.org/jira/browse/HIVE-28211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28211: -- Labels: pull-request-available (was: ) > Restore hive-exec-core jar > -- > > Key: HIVE-28211 > URL: https://issues.apache.org/jira/browse/HIVE-28211 > Project: Hive > Issue Type: Task >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > > The hive-exec-core jar is used by spark, oozie, hudi and many other pojects. > Removal of the hive-exec-core jar has caused the following issues. > Spark : [https://lists.apache.org/list?d...@hive.apache.org:lte=1M:joda] > Oozie: [https://lists.apache.org/thread/yld75ltf9y8d9q3cow3xqlg0fqyj6mkg] > Hudi: [apache/hudi#8147|https://github.com/apache/hudi/issues/8147] > Until the we shade & relocate dependencies in hive-exec, we should restore > the hive-exec core jar . -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28211) Restore hive-exec-core jar
Simhadri Govindappa created HIVE-28211: -- Summary: Restore hive-exec-core jar Key: HIVE-28211 URL: https://issues.apache.org/jira/browse/HIVE-28211 Project: Hive Issue Type: Task Reporter: Simhadri Govindappa Assignee: Simhadri Govindappa The hive-exec-core jar is used by spark, oozie, hudi and many other pojects. Removal of the hive-exec-core jar has caused the following issues. Spark : [https://lists.apache.org/list?d...@hive.apache.org:lte=1M:joda] Oozie: [https://lists.apache.org/thread/yld75ltf9y8d9q3cow3xqlg0fqyj6mkg] Hudi: [apache/hudi#8147|https://github.com/apache/hudi/issues/8147] Until the we shade & relocate dependencies in hive-exec, we should restore the hive-exec core jar . -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28210) Print Tez summary by default for tests
[ https://issues.apache.org/jira/browse/HIVE-28210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-28210: Description: This is to set "hive.tez.exec.print.summary" by default in tests, which is quite useful. {code} INFO : Query Execution Summary INFO : -- INFO : OPERATIONDURATION INFO : -- INFO : Compile Query 2.12s INFO : Prepare Plan8.65s INFO : Get Query Coordinator (AM) 0.01s INFO : Submit Plan 0.57s INFO : Start DAG 0.04s INFO : Run DAG 8.86s INFO : -- INFO : ... {code} was: This is to set "hive.tez.exec.print.summary" by default, which is quite useful. {code} INFO : Query Execution Summary INFO : -- INFO : OPERATIONDURATION INFO : -- INFO : Compile Query 2.12s INFO : Prepare Plan8.65s INFO : Get Query Coordinator (AM) 0.01s INFO : Submit Plan 0.57s INFO : Start DAG 0.04s INFO : Run DAG 8.86s INFO : -- INFO : ... {code} > Print Tez summary by default for tests > -- > > Key: HIVE-28210 > URL: https://issues.apache.org/jira/browse/HIVE-28210 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > > This is to set "hive.tez.exec.print.summary" by default in tests, which is > quite useful. > {code} > INFO : Query Execution Summary > INFO : > -- > INFO : OPERATIONDURATION > INFO : > -- > INFO : Compile Query 2.12s > INFO : Prepare Plan8.65s > INFO : Get Query Coordinator (AM) 0.01s > INFO : Submit Plan 0.57s > INFO : Start DAG 0.04s > INFO : Run DAG 8.86s > INFO : > -- > INFO : > ... > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28210) Print Tez summary by default in tests
[ https://issues.apache.org/jira/browse/HIVE-28210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-28210: Summary: Print Tez summary by default in tests (was: Print Tez summary by default for tests) > Print Tez summary by default in tests > - > > Key: HIVE-28210 > URL: https://issues.apache.org/jira/browse/HIVE-28210 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > > This is to set "hive.tez.exec.print.summary" by default in tests, which is > quite useful. > {code} > INFO : Query Execution Summary > INFO : > -- > INFO : OPERATIONDURATION > INFO : > -- > INFO : Compile Query 2.12s > INFO : Prepare Plan8.65s > INFO : Get Query Coordinator (AM) 0.01s > INFO : Submit Plan 0.57s > INFO : Start DAG 0.04s > INFO : Run DAG 8.86s > INFO : > -- > INFO : > ... > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)