[jira] [Commented] (HIVE-27303) select query result is different when enable/disable mapjoin with UNION ALL
[ https://issues.apache.org/jira/browse/HIVE-27303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17758371#comment-17758371 ] Sungwoo Park commented on HIVE-27303: - Could someone update the commit log so that Seonggon gets credit for the pull request resolving this JIRA? > select query result is different when enable/disable mapjoin with UNION ALL > --- > > Key: HIVE-27303 > URL: https://issues.apache.org/jira/browse/HIVE-27303 > Project: Hive > Issue Type: Bug >Reporter: Mahesh Raju Somalaraju >Assignee: Mahesh Raju Somalaraju >Priority: Major > Labels: pull-request-available > > select query result is different when enable/disable mapjoin with UNION ALL > Below are the reproduce steps. > As per query when map.join is disabled it should not give rows(duplicate). > Same is working fine with map.join=true. > Expected result: Empty rows. > Problem: returning duplicate rows. > Steps: > -- > SET hive.server2.tez.queue.access.check=true; > SET tez.queue.name=default > SET hive.query.results.cache.enabled=false; > SET hive.fetch.task.conversion=none; > SET hive.execution.engine=tez; > SET hive.stats.autogather=true; > SET hive.server2.enable.doAs=false; > SET hive.auto.convert.join=false; > drop table if exists hive1_tbl_data; > drop table if exists hive2_tbl_data; > drop table if exists hive3_tbl_data; > drop table if exists hive4_tbl_data; > CREATE EXTERNAL TABLE hive1_tbl_data (COLUMID string,COLUMN_FN > string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM > string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > TBLPROPERTIES ( > 'TRANSLATED_TO_EXTERNAL'='true', > 'bucketing_version'='2', > 'external.table.purge'='true', > 'parquet.compression'='SNAPPY'); > CREATE EXTERNAL TABLE hive2_tbl_data (COLUMID string,COLUMN_FN > string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM > string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > TBLPROPERTIES ( > 'TRANSLATED_TO_EXTERNAL'='true', > 'bucketing_version'='2', > 'external.table.purge'='true', > 'parquet.compression'='SNAPPY'); > CREATE EXTERNAL TABLE hive3_tbl_data (COLUMID string,COLUMN_FN > string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM > string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > TBLPROPERTIES ( > 'TRANSLATED_TO_EXTERNAL'='true', > 'bucketing_version'='2', > 'external.table.purge'='true', > 'parquet.compression'='SNAPPY'); > CREATE EXTERNAL TABLE hive4_tbl_data (COLUMID string,COLUMN_FN > string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM > string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > TBLPROPERTIES ( > 'TRANSLATED_TO_EXTERNAL'='true', > 'bucketing_version'='2', > 'external.table.purge'='true', > 'parquet.compression'='SNAPPY'); > > insert into table hive1_tbl_data select > '1','john','doe','j...@hotmail.com','2014-01-01 12:01:02','4000-1'; > insert into table hive1_tbl_data select > '2','john','doe','j...@hotmail.com','2014-01-01 >
[jira] [Commented] (HIVE-27303) select query result is different when enable/disable mapjoin with UNION ALL
[ https://issues.apache.org/jira/browse/HIVE-27303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17758354#comment-17758354 ] Ayush Saxena commented on HIVE-27303: - [~maheshrajus]/[~lvegh] can you please update the fix version, that is a mandatory field, the fix version is used to prepare the release notes during releases > select query result is different when enable/disable mapjoin with UNION ALL > --- > > Key: HIVE-27303 > URL: https://issues.apache.org/jira/browse/HIVE-27303 > Project: Hive > Issue Type: Bug >Reporter: Mahesh Raju Somalaraju >Assignee: Mahesh Raju Somalaraju >Priority: Major > Labels: pull-request-available > > select query result is different when enable/disable mapjoin with UNION ALL > Below are the reproduce steps. > As per query when map.join is disabled it should not give rows(duplicate). > Same is working fine with map.join=true. > Expected result: Empty rows. > Problem: returning duplicate rows. > Steps: > -- > SET hive.server2.tez.queue.access.check=true; > SET tez.queue.name=default > SET hive.query.results.cache.enabled=false; > SET hive.fetch.task.conversion=none; > SET hive.execution.engine=tez; > SET hive.stats.autogather=true; > SET hive.server2.enable.doAs=false; > SET hive.auto.convert.join=false; > drop table if exists hive1_tbl_data; > drop table if exists hive2_tbl_data; > drop table if exists hive3_tbl_data; > drop table if exists hive4_tbl_data; > CREATE EXTERNAL TABLE hive1_tbl_data (COLUMID string,COLUMN_FN > string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM > string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > TBLPROPERTIES ( > 'TRANSLATED_TO_EXTERNAL'='true', > 'bucketing_version'='2', > 'external.table.purge'='true', > 'parquet.compression'='SNAPPY'); > CREATE EXTERNAL TABLE hive2_tbl_data (COLUMID string,COLUMN_FN > string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM > string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > TBLPROPERTIES ( > 'TRANSLATED_TO_EXTERNAL'='true', > 'bucketing_version'='2', > 'external.table.purge'='true', > 'parquet.compression'='SNAPPY'); > CREATE EXTERNAL TABLE hive3_tbl_data (COLUMID string,COLUMN_FN > string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM > string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > TBLPROPERTIES ( > 'TRANSLATED_TO_EXTERNAL'='true', > 'bucketing_version'='2', > 'external.table.purge'='true', > 'parquet.compression'='SNAPPY'); > CREATE EXTERNAL TABLE hive4_tbl_data (COLUMID string,COLUMN_FN > string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM > string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > TBLPROPERTIES ( > 'TRANSLATED_TO_EXTERNAL'='true', > 'bucketing_version'='2', > 'external.table.purge'='true', > 'parquet.compression'='SNAPPY'); > > insert into table hive1_tbl_data select > '1','john','doe','j...@hotmail.com','2014-01-01 12:01:02','4000-1'; > insert into table hive1_tbl_data select >
[jira] [Resolved] (HIVE-27303) select query result is different when enable/disable mapjoin with UNION ALL
[ https://issues.apache.org/jira/browse/HIVE-27303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahesh Raju Somalaraju resolved HIVE-27303. --- Resolution: Fixed > select query result is different when enable/disable mapjoin with UNION ALL > --- > > Key: HIVE-27303 > URL: https://issues.apache.org/jira/browse/HIVE-27303 > Project: Hive > Issue Type: Bug >Reporter: Mahesh Raju Somalaraju >Assignee: Mahesh Raju Somalaraju >Priority: Major > Labels: pull-request-available > > select query result is different when enable/disable mapjoin with UNION ALL > Below are the reproduce steps. > As per query when map.join is disabled it should not give rows(duplicate). > Same is working fine with map.join=true. > Expected result: Empty rows. > Problem: returning duplicate rows. > Steps: > -- > SET hive.server2.tez.queue.access.check=true; > SET tez.queue.name=default > SET hive.query.results.cache.enabled=false; > SET hive.fetch.task.conversion=none; > SET hive.execution.engine=tez; > SET hive.stats.autogather=true; > SET hive.server2.enable.doAs=false; > SET hive.auto.convert.join=false; > drop table if exists hive1_tbl_data; > drop table if exists hive2_tbl_data; > drop table if exists hive3_tbl_data; > drop table if exists hive4_tbl_data; > CREATE EXTERNAL TABLE hive1_tbl_data (COLUMID string,COLUMN_FN > string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM > string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > TBLPROPERTIES ( > 'TRANSLATED_TO_EXTERNAL'='true', > 'bucketing_version'='2', > 'external.table.purge'='true', > 'parquet.compression'='SNAPPY'); > CREATE EXTERNAL TABLE hive2_tbl_data (COLUMID string,COLUMN_FN > string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM > string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > TBLPROPERTIES ( > 'TRANSLATED_TO_EXTERNAL'='true', > 'bucketing_version'='2', > 'external.table.purge'='true', > 'parquet.compression'='SNAPPY'); > CREATE EXTERNAL TABLE hive3_tbl_data (COLUMID string,COLUMN_FN > string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM > string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > TBLPROPERTIES ( > 'TRANSLATED_TO_EXTERNAL'='true', > 'bucketing_version'='2', > 'external.table.purge'='true', > 'parquet.compression'='SNAPPY'); > CREATE EXTERNAL TABLE hive4_tbl_data (COLUMID string,COLUMN_FN > string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM > string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > TBLPROPERTIES ( > 'TRANSLATED_TO_EXTERNAL'='true', > 'bucketing_version'='2', > 'external.table.purge'='true', > 'parquet.compression'='SNAPPY'); > > insert into table hive1_tbl_data select > '1','john','doe','j...@hotmail.com','2014-01-01 12:01:02','4000-1'; > insert into table hive1_tbl_data select > '2','john','doe','j...@hotmail.com','2014-01-01 > 12:01:02','4000-1';insert into table hive2_tbl_data select >
[jira] [Commented] (HIVE-27303) select query result is different when enable/disable mapjoin with UNION ALL
[ https://issues.apache.org/jira/browse/HIVE-27303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17758346#comment-17758346 ] Mahesh Raju Somalaraju commented on HIVE-27303: --- Merged the PR. https://github.com/apache/hive/pull/4406 > select query result is different when enable/disable mapjoin with UNION ALL > --- > > Key: HIVE-27303 > URL: https://issues.apache.org/jira/browse/HIVE-27303 > Project: Hive > Issue Type: Bug >Reporter: Mahesh Raju Somalaraju >Assignee: Mahesh Raju Somalaraju >Priority: Major > Labels: pull-request-available > > select query result is different when enable/disable mapjoin with UNION ALL > Below are the reproduce steps. > As per query when map.join is disabled it should not give rows(duplicate). > Same is working fine with map.join=true. > Expected result: Empty rows. > Problem: returning duplicate rows. > Steps: > -- > SET hive.server2.tez.queue.access.check=true; > SET tez.queue.name=default > SET hive.query.results.cache.enabled=false; > SET hive.fetch.task.conversion=none; > SET hive.execution.engine=tez; > SET hive.stats.autogather=true; > SET hive.server2.enable.doAs=false; > SET hive.auto.convert.join=false; > drop table if exists hive1_tbl_data; > drop table if exists hive2_tbl_data; > drop table if exists hive3_tbl_data; > drop table if exists hive4_tbl_data; > CREATE EXTERNAL TABLE hive1_tbl_data (COLUMID string,COLUMN_FN > string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM > string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > TBLPROPERTIES ( > 'TRANSLATED_TO_EXTERNAL'='true', > 'bucketing_version'='2', > 'external.table.purge'='true', > 'parquet.compression'='SNAPPY'); > CREATE EXTERNAL TABLE hive2_tbl_data (COLUMID string,COLUMN_FN > string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM > string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > TBLPROPERTIES ( > 'TRANSLATED_TO_EXTERNAL'='true', > 'bucketing_version'='2', > 'external.table.purge'='true', > 'parquet.compression'='SNAPPY'); > CREATE EXTERNAL TABLE hive3_tbl_data (COLUMID string,COLUMN_FN > string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM > string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > TBLPROPERTIES ( > 'TRANSLATED_TO_EXTERNAL'='true', > 'bucketing_version'='2', > 'external.table.purge'='true', > 'parquet.compression'='SNAPPY'); > CREATE EXTERNAL TABLE hive4_tbl_data (COLUMID string,COLUMN_FN > string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM > string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > TBLPROPERTIES ( > 'TRANSLATED_TO_EXTERNAL'='true', > 'bucketing_version'='2', > 'external.table.purge'='true', > 'parquet.compression'='SNAPPY'); > > insert into table hive1_tbl_data select > '1','john','doe','j...@hotmail.com','2014-01-01 12:01:02','4000-1'; > insert into table hive1_tbl_data select > '2','john','doe','j...@hotmail.com','2014-01-01 > 12:01:02','4000-1';insert into
[jira] [Updated] (HIVE-27644) Backport of HIVE-17917, HIVE-21457, HIVE-22582
[ https://issues.apache.org/jira/browse/HIVE-27644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27644: -- Labels: pull-request-available (was: ) > Backport of HIVE-17917, HIVE-21457, HIVE-22582 > -- > > Key: HIVE-27644 > URL: https://issues.apache.org/jira/browse/HIVE-27644 > Project: Hive > Issue Type: Sub-task >Affects Versions: 3.2.0 >Reporter: Aman Raj >Assignee: Aman Raj >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27644) Backport of HIVE-17917, HIVE-21457, HIVE-22582
Aman Raj created HIVE-27644: --- Summary: Backport of HIVE-17917, HIVE-21457, HIVE-22582 Key: HIVE-27644 URL: https://issues.apache.org/jira/browse/HIVE-27644 Project: Hive Issue Type: Sub-task Affects Versions: 3.2.0 Reporter: Aman Raj Assignee: Aman Raj -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27431) Clean invalid properties in test module
[ https://issues.apache.org/jira/browse/HIVE-27431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena resolved HIVE-27431. - Fix Version/s: 4.0.0 Resolution: Fixed > Clean invalid properties in test module > --- > > Key: HIVE-27431 > URL: https://issues.apache.org/jira/browse/HIVE-27431 > Project: Hive > Issue Type: Test > Components: Test >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: output.out > > > In *data/conf* module, *hive-site.xml* is used to qtest It keeps many > invalid properties, and if you run test in IDE, you will see lots lof WARN: > {code:java} > 2023-06-12T01:28:18,074 WARN [main] conf.HiveConf: HiveConf of name > hive.mapjoin.max.gc.time.percentage does not exist > 2023-06-12T01:28:18,074 WARN [main] conf.HiveConf: HiveConf of name > hive.llap.io.cache.orc.size does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.dummyparam.test.server.specific.config.override does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.metastore.metadb.dir does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.llap.io.cache.orc.alloc.min does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.dummyparam.test.server.specific.config.hivesite does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.llap.io.cache.orc.alloc.max does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.metastore.client.cache.maxSize does not exist > 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name > hive.dummyparam.test.server.specific.config.metastoresite does not exist > 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name > hive.metastore.client.cache.recordStats does not exist > 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name > hive.llap.io.cache.orc.arena.size does not exist > 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name > hive.stats.key.prefix.reserve.length does not exist {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27431) Clean invalid properties in test module
[ https://issues.apache.org/jira/browse/HIVE-27431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HIVE-27431: Summary: Clean invalid properties in test module (was: Clean invalid properties in test moduel) > Clean invalid properties in test module > --- > > Key: HIVE-27431 > URL: https://issues.apache.org/jira/browse/HIVE-27431 > Project: Hive > Issue Type: Test > Components: Test >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Minor > Labels: pull-request-available > Attachments: output.out > > > In *data/conf* module, *hive-site.xml* is used to qtest It keeps many > invalid properties, and if you run test in IDE, you will see lots lof WARN: > {code:java} > 2023-06-12T01:28:18,074 WARN [main] conf.HiveConf: HiveConf of name > hive.mapjoin.max.gc.time.percentage does not exist > 2023-06-12T01:28:18,074 WARN [main] conf.HiveConf: HiveConf of name > hive.llap.io.cache.orc.size does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.dummyparam.test.server.specific.config.override does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.metastore.metadb.dir does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.llap.io.cache.orc.alloc.min does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.dummyparam.test.server.specific.config.hivesite does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.llap.io.cache.orc.alloc.max does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.metastore.client.cache.maxSize does not exist > 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name > hive.dummyparam.test.server.specific.config.metastoresite does not exist > 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name > hive.metastore.client.cache.recordStats does not exist > 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name > hive.llap.io.cache.orc.arena.size does not exist > 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name > hive.stats.key.prefix.reserve.length does not exist {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27431) Clean invalid properties in test moduel
[ https://issues.apache.org/jira/browse/HIVE-27431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17758109#comment-17758109 ] Ayush Saxena commented on HIVE-27431: - Committed to master. Thanx [~zhangbutao] for the contribution & [~dkuzmenko] for the review!!! > Clean invalid properties in test moduel > --- > > Key: HIVE-27431 > URL: https://issues.apache.org/jira/browse/HIVE-27431 > Project: Hive > Issue Type: Test > Components: Test >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Minor > Labels: pull-request-available > Attachments: output.out > > > In *data/conf* module, *hive-site.xml* is used to qtest It keeps many > invalid properties, and if you run test in IDE, you will see lots lof WARN: > {code:java} > 2023-06-12T01:28:18,074 WARN [main] conf.HiveConf: HiveConf of name > hive.mapjoin.max.gc.time.percentage does not exist > 2023-06-12T01:28:18,074 WARN [main] conf.HiveConf: HiveConf of name > hive.llap.io.cache.orc.size does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.dummyparam.test.server.specific.config.override does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.metastore.metadb.dir does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.llap.io.cache.orc.alloc.min does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.dummyparam.test.server.specific.config.hivesite does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.llap.io.cache.orc.alloc.max does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.metastore.client.cache.maxSize does not exist > 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name > hive.dummyparam.test.server.specific.config.metastoresite does not exist > 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name > hive.metastore.client.cache.recordStats does not exist > 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name > hive.llap.io.cache.orc.arena.size does not exist > 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name > hive.stats.key.prefix.reserve.length does not exist {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27283) Use tez.local.mode in HiveServer2 for trivial queries
[ https://issues.apache.org/jira/browse/HIVE-27283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Turoczy updated HIVE-27283: -- Labels: check hive-4.1 (was: check) > Use tez.local.mode in HiveServer2 for trivial queries > - > > Key: HIVE-27283 > URL: https://issues.apache.org/jira/browse/HIVE-27283 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Priority: Major > Labels: check, hive-4.1 > > Today, a query like this: > {code} > INSERT INTO TABLE students VALUES ('fred flintstone', 35, 1.28), ('barney > rubble', 32, 2.32); > {code} > spins up a TezAM and containers. I believe this is not optimal, even if we > already have an tez application running. Not to mention setups where only a > hiveserver2 is alive and TezAMs + LLAP executors are spun up on demand, e.g. > Cloudera's Data Warehouse, but I'm assuming other companies might do a > similar thing in the cloud. > With this optimization a possible risk is to overwhelm Hiveserver2 with such > queries, this scenario should be handled with care. > My proposal is to maintain a local tez session pool (default size 0, > recommended is 1...4) in hs2, and let's identify "trivial queries" > compile-time that currently needs tez application (like the INSERT INTO > above). > The first implementation can include only simply INSERT INTO queries, and we > can decide the rest later. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27283) Use tez.local.mode in HiveServer2 for trivial queries
[ https://issues.apache.org/jira/browse/HIVE-27283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Turoczy updated HIVE-27283: -- Labels: check (was: ) > Use tez.local.mode in HiveServer2 for trivial queries > - > > Key: HIVE-27283 > URL: https://issues.apache.org/jira/browse/HIVE-27283 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Priority: Major > Labels: check > > Today, a query like this: > {code} > INSERT INTO TABLE students VALUES ('fred flintstone', 35, 1.28), ('barney > rubble', 32, 2.32); > {code} > spins up a TezAM and containers. I believe this is not optimal, even if we > already have an tez application running. Not to mention setups where only a > hiveserver2 is alive and TezAMs + LLAP executors are spun up on demand, e.g. > Cloudera's Data Warehouse, but I'm assuming other companies might do a > similar thing in the cloud. > With this optimization a possible risk is to overwhelm Hiveserver2 with such > queries, this scenario should be handled with care. > My proposal is to maintain a local tez session pool (default size 0, > recommended is 1...4) in hs2, and let's identify "trivial queries" > compile-time that currently needs tez application (like the INSERT INTO > above). > The first implementation can include only simply INSERT INTO queries, and we > can decide the rest later. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27643) Exclude compaction queries from ranger policies
[ https://issues.apache.org/jira/browse/HIVE-27643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Végh reassigned HIVE-27643: -- Assignee: László Végh > Exclude compaction queries from ranger policies > --- > > Key: HIVE-27643 > URL: https://issues.apache.org/jira/browse/HIVE-27643 > Project: Hive > Issue Type: Bug >Reporter: László Végh >Assignee: László Végh >Priority: Critical > > Applying masking or filtering Ranger policies on the compaction users cause > data loss, as the policies will be applied to the compaction queries also. > While this is a kind of misconfiguration, the result is so bad, that the > users should be protected from it by automatically excluding compaction > queries from ALL ranger policies. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HIVE-27643) Exclude compaction queries from ranger policies
[ https://issues.apache.org/jira/browse/HIVE-27643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-27643 started by László Végh. -- > Exclude compaction queries from ranger policies > --- > > Key: HIVE-27643 > URL: https://issues.apache.org/jira/browse/HIVE-27643 > Project: Hive > Issue Type: Bug >Reporter: László Végh >Assignee: László Végh >Priority: Critical > > Applying masking or filtering Ranger policies on the compaction users cause > data loss, as the policies will be applied to the compaction queries also. > While this is a kind of misconfiguration, the result is so bad, that the > users should be protected from it by automatically excluding compaction > queries from ALL ranger policies. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27643) Exclude compaction queries from ranger policies
László Végh created HIVE-27643: -- Summary: Exclude compaction queries from ranger policies Key: HIVE-27643 URL: https://issues.apache.org/jira/browse/HIVE-27643 Project: Hive Issue Type: Bug Reporter: László Végh Applying masking or filtering Ranger policies on the compaction users cause data loss, as the policies will be applied to the compaction queries also. While this is a kind of misconfiguration, the result is so bad, that the users should be protected from it by automatically excluding compaction queries from ALL ranger policies. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27642) StartMiniHS2Cluster fails to run due to missing JDBC driver with Postgres
[ https://issues.apache.org/jira/browse/HIVE-27642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757985#comment-17757985 ] Stamatis Zampetakis commented on HIVE-27642: Thanks for logging this [~zratkai], ping me if you need a review. > StartMiniHS2Cluster fails to run due to missing JDBC driver with Postgres > - > > Key: HIVE-27642 > URL: https://issues.apache.org/jira/browse/HIVE-27642 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0-beta-1 >Reporter: Zoltán Rátkai >Assignee: Zoltán Rátkai >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27642) StartMiniHS2Cluster fails to run due to missing JDBC driver with Postgres
Zoltán Rátkai created HIVE-27642: Summary: StartMiniHS2Cluster fails to run due to missing JDBC driver with Postgres Key: HIVE-27642 URL: https://issues.apache.org/jira/browse/HIVE-27642 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 4.0.0-beta-1 Reporter: Zoltán Rátkai Assignee: Zoltán Rátkai -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27617) Backport of HIVE-18284: Fix NPE when inserting data with 'distribute by' clause with dynpart sort optimization
[ https://issues.apache.org/jira/browse/HIVE-27617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan resolved HIVE-27617. - Fix Version/s: 3.2.0 Resolution: Fixed > Backport of HIVE-18284: Fix NPE when inserting data with 'distribute by' > clause with dynpart sort optimization > -- > > Key: HIVE-27617 > URL: https://issues.apache.org/jira/browse/HIVE-27617 > Project: Hive > Issue Type: Sub-task >Reporter: Aman Raj >Assignee: Aman Raj >Priority: Major > Labels: pull-request-available > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27639) Performance counters for easier investigations
[ https://issues.apache.org/jira/browse/HIVE-27639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-27639: Description: We need to move performance measurements to the next level and implement whatever is needed to make us able to find easier answers to problems like “query is slow”. The problem is that we keep digging into logs + watching metrics that are provided by *something* in the environment (that can be anything that the actual vendor implements outside of hive). Let's try to localize the environment problems to the interval of the slow query and make it exposed through counters. Also, let's keep in mind that performance measurements ideally should never cause performance problems itself: heavyweight measurements should be disabled by default. was:We need to move performance measurements to the next level and implement whatever is needed to make us able to find easier answers to problems like “query is slow”. The problem is that we keep digging into logs + watching metrics that are provided by *something* in the environment (that can be anything that the actual vendor implements outside of hive), > Performance counters for easier investigations > -- > > Key: HIVE-27639 > URL: https://issues.apache.org/jira/browse/HIVE-27639 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Priority: Major > > We need to move performance measurements to the next level and implement > whatever is needed to make us able to find easier answers to problems like > “query is slow”. The problem is that we keep digging into logs + watching > metrics that are provided by *something* in the environment (that can be > anything that the actual vendor implements outside of hive). > Let's try to localize the environment problems to the interval of the slow > query and make it exposed through counters. > Also, let's keep in mind that performance measurements ideally should never > cause performance problems itself: heavyweight measurements should be > disabled by default. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27639) Performance counters for easier investigations
[ https://issues.apache.org/jira/browse/HIVE-27639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-27639: Description: We need to move performance measurements to the next level and implement whatever is needed to make us able to find easier answers to problems like “query is slow”. The problem is that we keep digging into logs + watching metrics that are provided by *something* in the environment (that can be anything that the actual vendor implements outside of hive), > Performance counters for easier investigations > -- > > Key: HIVE-27639 > URL: https://issues.apache.org/jira/browse/HIVE-27639 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Priority: Major > > We need to move performance measurements to the next level and implement > whatever is needed to make us able to find easier answers to problems like > “query is slow”. The problem is that we keep digging into logs + watching > metrics that are provided by *something* in the environment (that can be > anything that the actual vendor implements outside of hive), -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27641) Counters for metastore usage
[ https://issues.apache.org/jira/browse/HIVE-27641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-27641: Description: While compiling a query in hs2, we might want to see the following: 1. how many calls happened to the metastore while compiling the query 2. what's the overall duration that the query spent while communicating this the metastore > Counters for metastore usage > > > Key: HIVE-27641 > URL: https://issues.apache.org/jira/browse/HIVE-27641 > Project: Hive > Issue Type: Sub-task >Reporter: László Bodor >Priority: Major > > While compiling a query in hs2, we might want to see the following: > 1. how many calls happened to the metastore while compiling the query > 2. what's the overall duration that the query spent while communicating this > the metastore -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27641) Counters for metastore usage
László Bodor created HIVE-27641: --- Summary: Counters for metastore usage Key: HIVE-27641 URL: https://issues.apache.org/jira/browse/HIVE-27641 Project: Hive Issue Type: Sub-task Reporter: László Bodor -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27640) Counter for query concurrency
[ https://issues.apache.org/jira/browse/HIVE-27640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-27640: Description: This is kind of hard to catch easily, but I would like to see something/anything about query concurrency in the query counters. This way we can instantly see in the query summary what happened. I mean counters like: 1. how many queries were running when this query arrived 2. same as 1) but in query stage level 2a) how many queries were being compiled (or waiting for compilation) when this query started to compile (or started to enqueued for compilation) 2b) how many queries were waiting for a coordinator when this query started to get a coordinator 2c) how many queries were in the Run DAG phase, when this query started to run DAG > Counter for query concurrency > - > > Key: HIVE-27640 > URL: https://issues.apache.org/jira/browse/HIVE-27640 > Project: Hive > Issue Type: Sub-task >Reporter: László Bodor >Priority: Major > > This is kind of hard to catch easily, but I would like to see > something/anything about query concurrency in the query counters. This way we > can instantly see in the query summary what happened. I mean counters like: > 1. how many queries were running when this query arrived > 2. same as 1) but in query stage level > 2a) how many queries were being compiled (or waiting for compilation) when > this query started to compile (or started to enqueued for compilation) > 2b) how many queries were waiting for a coordinator when this query started > to get a coordinator > 2c) how many queries were in the Run DAG phase, when this query started to > run DAG -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27640) Counter for query concurrency
László Bodor created HIVE-27640: --- Summary: Counter for query concurrency Key: HIVE-27640 URL: https://issues.apache.org/jira/browse/HIVE-27640 Project: Hive Issue Type: Sub-task Reporter: László Bodor -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27639) Performance counters for easier investigations
László Bodor created HIVE-27639: --- Summary: Performance counters for easier investigations Key: HIVE-27639 URL: https://issues.apache.org/jira/browse/HIVE-27639 Project: Hive Issue Type: Improvement Reporter: László Bodor -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27613) Backport of HIVE-22204: Beeline option to show/not show execution report
[ https://issues.apache.org/jira/browse/HIVE-27613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan resolved HIVE-27613. - Fix Version/s: 3.2.0 Resolution: Fixed > Backport of HIVE-22204: Beeline option to show/not show execution report > > > Key: HIVE-27613 > URL: https://issues.apache.org/jira/browse/HIVE-27613 > Project: Hive > Issue Type: Sub-task >Reporter: Aman Raj >Assignee: Aman Raj >Priority: Major > Labels: pull-request-available > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27624) Backport of HIVE-26080: Upgrade accumulo-core to 1.10.1
[ https://issues.apache.org/jira/browse/HIVE-27624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan resolved HIVE-27624. - Fix Version/s: 3.2.0 Resolution: Fixed > Backport of HIVE-26080: Upgrade accumulo-core to 1.10.1 > --- > > Key: HIVE-27624 > URL: https://issues.apache.org/jira/browse/HIVE-27624 > Project: Hive > Issue Type: Sub-task >Reporter: Aman Raj >Assignee: Aman Raj >Priority: Major > Labels: pull-request-available > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27630) Iceberg: Fast forward/rebase branch
[ https://issues.apache.org/jira/browse/HIVE-27630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena reassigned HIVE-27630: --- Assignee: Ayush Saxena > Iceberg: Fast forward/rebase branch > --- > > Key: HIVE-27630 > URL: https://issues.apache.org/jira/browse/HIVE-27630 > Project: Hive > Issue Type: Sub-task >Reporter: Denys Kuzmenko >Assignee: Ayush Saxena >Priority: Major > > Add support to fastForward main branch to the head of feature-branch to > update the main table state. > {code} > table.manageSnapshots().fastForward("main", "feature-branch").commit() > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)