[jira] [Updated] (HIVE-27428) CTAS fails with SemanticException when join subquery has complex type column and false filter predicate
[ https://issues.apache.org/jira/browse/HIVE-27428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-27428: -- Description: Repro steps: {code:java} drop table if exists table1; drop table if exists table2; create table table1 (a string, b string); create table table2 (complex_column create table table2 (complex_column array, values:array); -- CTAS failing query create table table3 as with t1 as (select * from table1), t2 as (select * from table2 where 1=0) select t1.*, t2.* from t1 left join t2;{code} Exception: {code:java} Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: CREATE-TABLE-AS-SELECT creates a VOID type, please use CAST to specify the type, near field: t2.complex_column at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8171) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8129) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7822) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11248) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:11120) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12050) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11916) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12730) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:722) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12831) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:442) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:300) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:194) {code} was: Repro steps: {code:java} drop table if exists table1; drop table if exists table2; create table table1 (a string, b string); create table table2 (complex_column create table table2 (complex_column array, values:array); -- CTAS failing query create table table3 as with t1 as (select * from table1), t2 as (select * from table2 where 1=0) select t1.*, t2.* from t1 left join t2;{code} Exception: {code:java} Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: CREATE-TABLE-AS-SELECT creates a VOID type, please use CAST to specify the type, near field: t2.df0rrd_prod_wers_x at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8171) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8129) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7822) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11248) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:11120) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12050) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11916) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12730) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:722) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12831) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:442) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:300) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:194) {code} > CTAS fails with SemanticException when join subquery has complex type column > and false filter predicate > --- > > Key: HIVE-27428 > URL: https://issues.apache.org/jira/browse/HIVE-27428 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Priority: Major > > Repro steps: > {code:java} > drop table if exists table1; > drop table if exists table2; > create table table1 (a string, b string); > create table table2
[jira] [Created] (HIVE-27428) CTAS fails with SemanticException when join subquery has complex type column and false filter predicate
Naresh P R created HIVE-27428: - Summary: CTAS fails with SemanticException when join subquery has complex type column and false filter predicate Key: HIVE-27428 URL: https://issues.apache.org/jira/browse/HIVE-27428 Project: Hive Issue Type: Bug Reporter: Naresh P R Repro steps: {code:java} drop table if exists table1; drop table if exists table2; create table table1 (a string, b string); create table table2 (complex_column create table table2 (complex_column array, values:array); -- CTAS failing query create table table3 as with t1 as (select * from table1), t2 as (select * from table2 where 1=0) select t1.*, t2.* from t1 left join t2;{code} Exception: {code:java} Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: CREATE-TABLE-AS-SELECT creates a VOID type, please use CAST to specify the type, near field: t2.df0rrd_prod_wers_x at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8171) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8129) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7822) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11248) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:11120) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12050) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11916) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12730) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:722) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12831) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:442) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:300) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:194) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27964) Support drop stats similar to Impala
[ https://issues.apache.org/jira/browse/HIVE-27964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799522#comment-17799522 ] Naresh P R commented on HIVE-27964: --- Partition table rename gets clogged at PART_COL_STATS for wide tables. {code:java} CREATE TABLE IF NOT EXISTS `PART_COL_STATS` ( ... `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, ...){code} Since PART_COL_STATS holds db_name & table_name, incase of table rename, every row in PART_COL_STATS associated with the table should be fetched, stored in memory, dropped & re-added with new tableName. Instead clearing the stats before rename & computing later would help to speed up the process. Another optimization i was about to raise is to remove DB_NAME, TABLE_NAME, PARTITION_NAME from PART_COL_STATS & use PART_ID as FOREIGN KEY from PARTITIONS to avoid touching PART_COL_STATS for table/partition renames. > Support drop stats similar to Impala > > > Key: HIVE-27964 > URL: https://issues.apache.org/jira/browse/HIVE-27964 > Project: Hive > Issue Type: New Feature >Reporter: Naresh P R >Priority: Major > > Hive should support drop stats similar to impala. > https://impala.apache.org/docs/build/html/topics/impala_drop_stats.html -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-27964) Support drop stats similar to Impala
[ https://issues.apache.org/jira/browse/HIVE-27964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799522#comment-17799522 ] Naresh P R edited comment on HIVE-27964 at 12/21/23 5:47 PM: - Partition table rename gets clogged at PART_COL_STATS for wide tables. {code:java} CREATE TABLE IF NOT EXISTS `PART_COL_STATS` ( ... `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, ...){code} Since PART_COL_STATS holds db_name & table_name, incase of table rename, every row in PART_COL_STATS associated with the table should be fetched, stored in memory, delete & re-insert with new db/table/partition name. Instead clearing the stats before rename & computing later would help to speed up the process. Another optimization i was about to raise is to remove DB_NAME, TABLE_NAME, PARTITION_NAME from PART_COL_STATS & use PART_ID as FOREIGN KEY from PARTITIONS to avoid touching PART_COL_STATS for table/partition renames. was (Author: nareshpr): Partition table rename gets clogged at PART_COL_STATS for wide tables. {code:java} CREATE TABLE IF NOT EXISTS `PART_COL_STATS` ( ... `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, ...){code} Since PART_COL_STATS holds db_name & table_name, incase of table rename, every row in PART_COL_STATS associated with the table should be fetched, stored in memory, dropped & re-added with new tableName. Instead clearing the stats before rename & computing later would help to speed up the process. Another optimization i was about to raise is to remove DB_NAME, TABLE_NAME, PARTITION_NAME from PART_COL_STATS & use PART_ID as FOREIGN KEY from PARTITIONS to avoid touching PART_COL_STATS for table/partition renames. > Support drop stats similar to Impala > > > Key: HIVE-27964 > URL: https://issues.apache.org/jira/browse/HIVE-27964 > Project: Hive > Issue Type: New Feature >Reporter: Naresh P R >Priority: Major > > Hive should support drop stats similar to impala. > https://impala.apache.org/docs/build/html/topics/impala_drop_stats.html -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27965) Table/partition rename takes a long time at PART_COL_STATS for wide tables
Naresh P R created HIVE-27965: - Summary: Table/partition rename takes a long time at PART_COL_STATS for wide tables Key: HIVE-27965 URL: https://issues.apache.org/jira/browse/HIVE-27965 Project: Hive Issue Type: Improvement Reporter: Naresh P R Partition table rename gets clogged at PART_COL_STATS for wide tables. {code:java} CREATE TABLE IF NOT EXISTS `PART_COL_STATS` ( ... `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, ...){code} Since PART_COL_STATS holds db_name & table_name, incase of table rename, every row in PART_COL_STATS associated with the table should be fetched, stored in memory, delete & re-insert with new db/table/partition name. Remove DB_NAME, TABLE_NAME, PARTITION_NAME from PART_COL_STATS & use PART_ID as FOREIGN KEY from PARTITIONS to avoid touching PART_COL_STATS for table/partition renames. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27964) Support drop stats similar to Impala
Naresh P R created HIVE-27964: - Summary: Support drop stats similar to Impala Key: HIVE-27964 URL: https://issues.apache.org/jira/browse/HIVE-27964 Project: Hive Issue Type: New Feature Reporter: Naresh P R Hive should support drop stats similar to impala. https://impala.apache.org/docs/build/html/topics/impala_drop_stats.html -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27965) Table/partition rename takes a long time at PART_COL_STATS for wide tables
[ https://issues.apache.org/jira/browse/HIVE-27965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799649#comment-17799649 ] Naresh P R commented on HIVE-27965: --- [~zhangbutao] Yes, this helps. Thanks for letting me know. > Table/partition rename takes a long time at PART_COL_STATS for wide tables > -- > > Key: HIVE-27965 > URL: https://issues.apache.org/jira/browse/HIVE-27965 > Project: Hive > Issue Type: Improvement >Reporter: Naresh P R >Priority: Major > > Partition table rename gets clogged at PART_COL_STATS for wide tables. > {code:java} > CREATE TABLE IF NOT EXISTS `PART_COL_STATS` ( > ... > `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, > `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, > `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT > NULL, > ...){code} > Since PART_COL_STATS holds db_name & table_name, incase of table rename, > every row in PART_COL_STATS associated with the table should be fetched, > stored in memory, delete & re-insert with new db/table/partition name. > > Remove DB_NAME, TABLE_NAME, PARTITION_NAME from PART_COL_STATS, instead use > TBL_ID, DB_ID, PART_ID to avoid touching PART_COL_STATS for table/partition > renames. > Also TBL_ID, DB_ID, PART_ID can be used for PART_COL_STATS INDEXING. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27965) Table/partition rename takes a long time at PART_COL_STATS for wide tables
[ https://issues.apache.org/jira/browse/HIVE-27965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-27965: -- Description: Partition table rename gets clogged at PART_COL_STATS for wide tables. {code:java} CREATE TABLE IF NOT EXISTS `PART_COL_STATS` ( ... `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, ...){code} Since PART_COL_STATS holds db_name & table_name, incase of table rename, every row in PART_COL_STATS associated with the table should be fetched, stored in memory, delete & re-insert with new db/table/partition name. Remove DB_NAME, TABLE_NAME, PARTITION_NAME from PART_COL_STATS & use & use TBL_ID, DB_ID, PART_ID to avoid touching PART_COL_STATS for table/partition renames. Also TBL_ID, DB_ID, PART_ID can be used for PART_COL_STATS INDEXING. was: Partition table rename gets clogged at PART_COL_STATS for wide tables. {code:java} CREATE TABLE IF NOT EXISTS `PART_COL_STATS` ( ... `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, ...){code} Since PART_COL_STATS holds db_name & table_name, incase of table rename, every row in PART_COL_STATS associated with the table should be fetched, stored in memory, delete & re-insert with new db/table/partition name. Remove DB_NAME, TABLE_NAME, PARTITION_NAME from PART_COL_STATS & use PART_ID as FOREIGN KEY from PARTITIONS to avoid touching PART_COL_STATS for table/partition renames. > Table/partition rename takes a long time at PART_COL_STATS for wide tables > -- > > Key: HIVE-27965 > URL: https://issues.apache.org/jira/browse/HIVE-27965 > Project: Hive > Issue Type: Improvement >Reporter: Naresh P R >Priority: Major > > Partition table rename gets clogged at PART_COL_STATS for wide tables. > {code:java} > CREATE TABLE IF NOT EXISTS `PART_COL_STATS` ( > ... > `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, > `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, > `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT > NULL, > ...){code} > Since PART_COL_STATS holds db_name & table_name, incase of table rename, > every row in PART_COL_STATS associated with the table should be fetched, > stored in memory, delete & re-insert with new db/table/partition name. > > Remove DB_NAME, TABLE_NAME, PARTITION_NAME from PART_COL_STATS & use & use > TBL_ID, DB_ID, PART_ID to avoid touching PART_COL_STATS for table/partition > renames. > Also TBL_ID, DB_ID, PART_ID can be used for PART_COL_STATS INDEXING. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27965) Table/partition rename takes a long time at PART_COL_STATS for wide tables
[ https://issues.apache.org/jira/browse/HIVE-27965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-27965: -- Description: Partition table rename gets clogged at PART_COL_STATS for wide tables. {code:java} CREATE TABLE IF NOT EXISTS `PART_COL_STATS` ( ... `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, ...){code} Since PART_COL_STATS holds db_name & table_name, incase of table rename, every row in PART_COL_STATS associated with the table should be fetched, stored in memory, delete & re-insert with new db/table/partition name. Remove DB_NAME, TABLE_NAME, PARTITION_NAME from PART_COL_STATS, instead use TBL_ID, DB_ID, PART_ID to avoid touching PART_COL_STATS for table/partition renames. Also TBL_ID, DB_ID, PART_ID can be used for PART_COL_STATS INDEXING. was: Partition table rename gets clogged at PART_COL_STATS for wide tables. {code:java} CREATE TABLE IF NOT EXISTS `PART_COL_STATS` ( ... `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, ...){code} Since PART_COL_STATS holds db_name & table_name, incase of table rename, every row in PART_COL_STATS associated with the table should be fetched, stored in memory, delete & re-insert with new db/table/partition name. Remove DB_NAME, TABLE_NAME, PARTITION_NAME from PART_COL_STATS & use & use TBL_ID, DB_ID, PART_ID to avoid touching PART_COL_STATS for table/partition renames. Also TBL_ID, DB_ID, PART_ID can be used for PART_COL_STATS INDEXING. > Table/partition rename takes a long time at PART_COL_STATS for wide tables > -- > > Key: HIVE-27965 > URL: https://issues.apache.org/jira/browse/HIVE-27965 > Project: Hive > Issue Type: Improvement >Reporter: Naresh P R >Priority: Major > > Partition table rename gets clogged at PART_COL_STATS for wide tables. > {code:java} > CREATE TABLE IF NOT EXISTS `PART_COL_STATS` ( > ... > `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, > `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, > `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT > NULL, > ...){code} > Since PART_COL_STATS holds db_name & table_name, incase of table rename, > every row in PART_COL_STATS associated with the table should be fetched, > stored in memory, delete & re-insert with new db/table/partition name. > > Remove DB_NAME, TABLE_NAME, PARTITION_NAME from PART_COL_STATS, instead use > TBL_ID, DB_ID, PART_ID to avoid touching PART_COL_STATS for table/partition > renames. > Also TBL_ID, DB_ID, PART_ID can be used for PART_COL_STATS INDEXING. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-27964) Support drop stats similar to Impala
[ https://issues.apache.org/jira/browse/HIVE-27964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799522#comment-17799522 ] Naresh P R edited comment on HIVE-27964 at 12/21/23 6:17 PM: - Partition table rename gets clogged at PART_COL_STATS for wide tables. {code:java} CREATE TABLE IF NOT EXISTS `PART_COL_STATS` ( ... `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, ...){code} Since PART_COL_STATS holds db_name & table_name, incase of table rename, every row in PART_COL_STATS associated with the table should be fetched, stored in memory, delete & re-insert with new db/table/partition name. Instead clearing the stats before rename & computing later would help to speed up the process. Just raised another optimization HIVE-27965, to remove DB_NAME, TABLE_NAME, PARTITION_NAME from PART_COL_STATS & use TBL_ID, DB_ID, PART_ID to avoid touching PART_COL_STATS for table/partition renames + can be used in indexes as well. was (Author: nareshpr): Partition table rename gets clogged at PART_COL_STATS for wide tables. {code:java} CREATE TABLE IF NOT EXISTS `PART_COL_STATS` ( ... `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, ...){code} Since PART_COL_STATS holds db_name & table_name, incase of table rename, every row in PART_COL_STATS associated with the table should be fetched, stored in memory, delete & re-insert with new db/table/partition name. Instead clearing the stats before rename & computing later would help to speed up the process. Another optimization i was about to raise is to remove DB_NAME, TABLE_NAME, PARTITION_NAME from PART_COL_STATS & use TBL_ID, DB_ID, PART_ID to avoid touching PART_COL_STATS for table/partition renames + can be used in indexes as well. > Support drop stats similar to Impala > > > Key: HIVE-27964 > URL: https://issues.apache.org/jira/browse/HIVE-27964 > Project: Hive > Issue Type: New Feature >Reporter: Naresh P R >Priority: Major > > Hive should support drop stats similar to impala. > https://impala.apache.org/docs/build/html/topics/impala_drop_stats.html -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-27964) Support drop stats similar to Impala
[ https://issues.apache.org/jira/browse/HIVE-27964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799522#comment-17799522 ] Naresh P R edited comment on HIVE-27964 at 12/21/23 6:10 PM: - Partition table rename gets clogged at PART_COL_STATS for wide tables. {code:java} CREATE TABLE IF NOT EXISTS `PART_COL_STATS` ( ... `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, ...){code} Since PART_COL_STATS holds db_name & table_name, incase of table rename, every row in PART_COL_STATS associated with the table should be fetched, stored in memory, delete & re-insert with new db/table/partition name. Instead clearing the stats before rename & computing later would help to speed up the process. Another optimization i was about to raise is to remove DB_NAME, TABLE_NAME, PARTITION_NAME from PART_COL_STATS & use TBL_ID, DB_ID, PART_ID to avoid touching PART_COL_STATS for table/partition renames + can be used in indexes as well. was (Author: nareshpr): Partition table rename gets clogged at PART_COL_STATS for wide tables. {code:java} CREATE TABLE IF NOT EXISTS `PART_COL_STATS` ( ... `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL, ...){code} Since PART_COL_STATS holds db_name & table_name, incase of table rename, every row in PART_COL_STATS associated with the table should be fetched, stored in memory, delete & re-insert with new db/table/partition name. Instead clearing the stats before rename & computing later would help to speed up the process. Another optimization i was about to raise is to remove DB_NAME, TABLE_NAME, PARTITION_NAME from PART_COL_STATS & use PART_ID as FOREIGN KEY from PARTITIONS to avoid touching PART_COL_STATS for table/partition renames. > Support drop stats similar to Impala > > > Key: HIVE-27964 > URL: https://issues.apache.org/jira/browse/HIVE-27964 > Project: Hive > Issue Type: New Feature >Reporter: Naresh P R >Priority: Major > > Hive should support drop stats similar to impala. > https://impala.apache.org/docs/build/html/topics/impala_drop_stats.html -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27885) Cast decimal from string with space without digits before dot returns NULL
[ https://issues.apache.org/jira/browse/HIVE-27885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789609#comment-17789609 ] Naresh P R commented on HIVE-27885: --- Thank you [~rameshkumar] & [~ngangam] for the review and commit. > Cast decimal from string with space without digits before dot returns NULL > -- > > Key: HIVE-27885 > URL: https://issues.apache.org/jira/browse/HIVE-27885 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > eg., > select cast(". " as decimal(8,4)) > {code:java} > – Expected output > 0. > – Actual output > NULL > {code} > select cast("0. " as decimal(8,4)) > {code:java} > – Actual output > 0. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27876) Incorrect query results on tables with ClusterBy & SortBy
Naresh P R created HIVE-27876: - Summary: Incorrect query results on tables with ClusterBy & SortBy Key: HIVE-27876 URL: https://issues.apache.org/jira/browse/HIVE-27876 Project: Hive Issue Type: Bug Reporter: Naresh P R Repro: {code:java} create external table test_bucket(age int, name string, dept string) clustered by (age, name) sorted by (age asc, name asc) into 2 buckets stored as orc; insert into test_bucket values (1, 'user1', 'dept1'), ( 2, 'user2' , 'dept2'); insert into test_bucket values (1, 'user1', 'dept1'), ( 2, 'user2' , 'dept2'); //empty wrong results with default CDP configs select age, name, count(*) from test_bucket group by age, name having count(*) > 1; +--+---+--+ | age | name | _c2 | +--+---+--+ +--+---+--+ // Workaround set hive.map.aggr=false; select age, name, count(*) from test_bucket group by age, name having count(*) > 1; +--++--+ | age | name | _c2 | +--++--+ | 1 | user1 | 2 | | 2 | user2 | 2 | +--++--+ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27876) Incorrect query results on tables with ClusterBy & SortBy
[ https://issues.apache.org/jira/browse/HIVE-27876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-27876: -- Description: Repro: {code:java} create external table test_bucket(age int, name string, dept string) clustered by (age, name) sorted by (age asc, name asc) into 2 buckets stored as orc; insert into test_bucket values (1, 'user1', 'dept1'), ( 2, 'user2' , 'dept2'); insert into test_bucket values (1, 'user1', 'dept1'), ( 2, 'user2' , 'dept2'); //empty wrong results select age, name, count(*) from test_bucket group by age, name having count(*) > 1; +--+---+--+ | age | name | _c2 | +--+---+--+ +--+---+--+ // Workaround set hive.map.aggr=false; select age, name, count(*) from test_bucket group by age, name having count(*) > 1; +--++--+ | age | name | _c2 | +--++--+ | 1 | user1 | 2 | | 2 | user2 | 2 | +--++--+ {code} was: Repro: {code:java} create external table test_bucket(age int, name string, dept string) clustered by (age, name) sorted by (age asc, name asc) into 2 buckets stored as orc; insert into test_bucket values (1, 'user1', 'dept1'), ( 2, 'user2' , 'dept2'); insert into test_bucket values (1, 'user1', 'dept1'), ( 2, 'user2' , 'dept2'); //empty wrong results with default CDP configs select age, name, count(*) from test_bucket group by age, name having count(*) > 1; +--+---+--+ | age | name | _c2 | +--+---+--+ +--+---+--+ // Workaround set hive.map.aggr=false; select age, name, count(*) from test_bucket group by age, name having count(*) > 1; +--++--+ | age | name | _c2 | +--++--+ | 1 | user1 | 2 | | 2 | user2 | 2 | +--++--+ {code} > Incorrect query results on tables with ClusterBy & SortBy > - > > Key: HIVE-27876 > URL: https://issues.apache.org/jira/browse/HIVE-27876 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Priority: Major > > Repro: > > {code:java} > create external table test_bucket(age int, name string, dept string) > clustered by (age, name) sorted by (age asc, name asc) into 2 buckets stored > as orc; > insert into test_bucket values (1, 'user1', 'dept1'), ( 2, 'user2' , 'dept2'); > insert into test_bucket values (1, 'user1', 'dept1'), ( 2, 'user2' , 'dept2'); > //empty wrong results > select age, name, count(*) from test_bucket group by age, name having > count(*) > 1; > +--+---+--+ > | age | name | _c2 | > +--+---+--+ > +--+---+--+ > // Workaround > set hive.map.aggr=false; > select age, name, count(*) from test_bucket group by age, name having > count(*) > 1; > +--++--+ > | age | name | _c2 | > +--++--+ > | 1 | user1 | 2 | > | 2 | user2 | 2 | > +--++--+ {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27885) Cast decimal from string with space without digits before dot returns NULL
Naresh P R created HIVE-27885: - Summary: Cast decimal from string with space without digits before dot returns NULL Key: HIVE-27885 URL: https://issues.apache.org/jira/browse/HIVE-27885 Project: Hive Issue Type: Bug Environment: eg., select cast(". " as decimal(8,4)) -- Expected output 0. -- Actual output NULL select cast("0. " as decimal(8,4)) -- Actual output 0. Reporter: Naresh P R Assignee: Naresh P R -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27885) Cast decimal from string with space without digits before dot returns NULL
[ https://issues.apache.org/jira/browse/HIVE-27885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-27885: -- Environment: (was: eg., select cast(". " as decimal(8,4)) -- Expected output 0. -- Actual output NULL select cast("0. " as decimal(8,4)) -- Actual output 0.) > Cast decimal from string with space without digits before dot returns NULL > -- > > Key: HIVE-27885 > URL: https://issues.apache.org/jira/browse/HIVE-27885 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27885) Cast decimal from string with space without digits before dot returns NULL
[ https://issues.apache.org/jira/browse/HIVE-27885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-27885: -- Description: eg., select cast(". " as decimal(8,4)) {code:java} – Expected output 0. – Actual output NULL {code} select cast("0. " as decimal(8,4)) {code:java} – Actual output 0. {code} was: eg., select cast(". " as decimal(8,4)) {code:java} – Expected output 0. – Actual output NULL {code} select cast("0. " as decimal(8,4)) {code:java} – Actual output 0. {code} > Cast decimal from string with space without digits before dot returns NULL > -- > > Key: HIVE-27885 > URL: https://issues.apache.org/jira/browse/HIVE-27885 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > > eg., > select cast(". " as decimal(8,4)) > {code:java} > – Expected output > 0. > – Actual output > NULL > {code} > select cast("0. " as decimal(8,4)) > {code:java} > – Actual output > 0. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27885) Cast decimal from string with space without digits before dot returns NULL
[ https://issues.apache.org/jira/browse/HIVE-27885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-27885: -- Description: eg., select cast(". " as decimal(8,4)) {code:java} – Expected output 0. – Actual output NULL {code} select cast("0. " as decimal(8,4)) {code:java} – Actual output 0. {code} > Cast decimal from string with space without digits before dot returns NULL > -- > > Key: HIVE-27885 > URL: https://issues.apache.org/jira/browse/HIVE-27885 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > > eg., > select cast(". " as decimal(8,4)) > > {code:java} > – Expected output > 0. > – Actual output > NULL > {code} > select cast("0. " as decimal(8,4)) > > {code:java} > – Actual output > 0. > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28213) Incorrect results after insert-select from similar bucketed source & target table
Naresh P R created HIVE-28213: - Summary: Incorrect results after insert-select from similar bucketed source & target table Key: HIVE-28213 URL: https://issues.apache.org/jira/browse/HIVE-28213 Project: Hive Issue Type: Bug Reporter: Naresh P R Attachments: test.q Insert-select is not honoring bucketing if both source & target are bucketed on same column. eg., {code:java} CREATE EXTERNAL TABLE bucketing_table1 (id INT) CLUSTERED BY (id) SORTED BY (id ASC) INTO 32 BUCKETS stored as textfile; INSERT INTO TABLE bucketing_table1 VALUES (1), (2), (3), (4), (5); CREATE EXTERNAL TABLE bucketing_table2 like bucketing_table1; INSERT INTO TABLE bucketing_table2 select * from bucketing_table1;{code} id=1 => murmur_hash(1) %32 should go to 29th bucket file. bucketing_table1 has id=1 at 29th file, but bucketing_table2 doesn't have 29th file because Insert-select dint honor the bucketing. {code:java} SELECT count(*) FROM bucketing_table1 WHERE id = 1; === 1 //correct result SELECT count(*) FROM bucketing_table2 WHERE id = 1; === 0 // incorrect result{code} Workaround: hive.tez.bucket.pruning=false; PS: Attaching repro file [^test.q] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28213) Incorrect results after insert-select from similar bucketed source & target table
[ https://issues.apache.org/jira/browse/HIVE-28213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-28213: -- Description: Insert-select is not honoring bucketing if both source & target are bucketed on same column. eg., {code:java} CREATE EXTERNAL TABLE bucketing_table1 (id INT) CLUSTERED BY (id) SORTED BY (id ASC) INTO 32 BUCKETS stored as textfile; INSERT INTO TABLE bucketing_table1 VALUES (1), (2), (3), (4), (5); CREATE EXTERNAL TABLE bucketing_table2 like bucketing_table1; INSERT INTO TABLE bucketing_table2 select * from bucketing_table1;{code} id=1 => murmur_hash(1) %32 should go to 29th bucket file. bucketing_table1 has id=1 at 29th file, but bucketing_table2 doesn't have 29th file because Insert-select dint honor the bucketing. {code:java} SELECT count(*) FROM bucketing_table1 WHERE id = 1; === 1 //correct result SELECT count(*) FROM bucketing_table2 WHERE id = 1; === 0 // incorrect result select *, INPUT__FILE__NAME from bucketing_table1; +--++ | bucketing_table1.id | input__file__name | +--++ | 2 | /bucketing_table1/04_0 | | 3 | /bucketing_table1/06_0 | | 5 | /bucketing_table1/15_0 | | 4 | /bucketing_table1/21_0 | | 1 | /bucketing_table1/29_0 | +--++ select *, INPUT__FILE__NAME from bucketing_table2; +-++ | bucketing_table2.id | input__file__name | +-++ | 2 | /bucketing_table2/00_0 | | 3 | /bucketing_table2/01_0 | | 5 | /bucketing_table2/02_0 | | 4 | /bucketing_table2/03_0 | | 1 | /bucketing_table2/04_0 | +--++{code} Workaround for read: hive.tez.bucket.pruning=false; PS: Attaching repro file [^test.q] was: Insert-select is not honoring bucketing if both source & target are bucketed on same column. eg., {code:java} CREATE EXTERNAL TABLE bucketing_table1 (id INT) CLUSTERED BY (id) SORTED BY (id ASC) INTO 32 BUCKETS stored as textfile; INSERT INTO TABLE bucketing_table1 VALUES (1), (2), (3), (4), (5); CREATE EXTERNAL TABLE bucketing_table2 like bucketing_table1; INSERT INTO TABLE bucketing_table2 select * from bucketing_table1;{code} id=1 => murmur_hash(1) %32 should go to 29th bucket file. bucketing_table1 has id=1 at 29th file, but bucketing_table2 doesn't have 29th file because Insert-select dint honor the bucketing. {code:java} SELECT count(*) FROM bucketing_table1 WHERE id = 1; === 1 //correct result SELECT count(*) FROM bucketing_table2 WHERE id = 1; === 0 // incorrect result{code} Workaround: hive.tez.bucket.pruning=false; PS: Attaching repro file [^test.q] > Incorrect results after insert-select from similar bucketed source & target > table > - > > Key: HIVE-28213 > URL: https://issues.apache.org/jira/browse/HIVE-28213 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Priority: Major > Attachments: test.q > > > Insert-select is not honoring bucketing if both source & target are bucketed > on same column. > eg., > {code:java} > CREATE EXTERNAL TABLE bucketing_table1 (id INT) > CLUSTERED BY (id) > SORTED BY (id ASC) > INTO 32 BUCKETS stored as textfile; > INSERT INTO TABLE bucketing_table1 VALUES (1), (2), (3), (4), (5); > CREATE EXTERNAL TABLE bucketing_table2 like bucketing_table1; > INSERT INTO TABLE bucketing_table2 select * from bucketing_table1;{code} > id=1 => murmur_hash(1) %32 should go to 29th bucket file. > bucketing_table1 has id=1 at 29th file, > but bucketing_table2 doesn't have 29th file because Insert-select dint honor > the bucketing. > {code:java} > SELECT count(*) FROM bucketing_table1 WHERE id = 1; > === > 1 //correct result > SELECT count(*) FROM bucketing_table2 WHERE id = 1; > === > 0 // incorrect result > select *, INPUT__FILE__NAME from bucketing_table1; > +--++ > | bucketing_table1.id | input__file__name | > +--++ > | 2 | /bucketing_table1/04_0 | > | 3 | /bucketing_table1/06_0 | > | 5 | /bucketing_table1/15_0 | > | 4 | /bucketing_table1/21_0 | > | 1
[jira] [Updated] (HIVE-28213) Incorrect results after insert-select from similar bucketed source & target table
[ https://issues.apache.org/jira/browse/HIVE-28213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-28213: -- Description: Insert-select is not honoring bucketing if both source & target are bucketed on same column. eg., {code:java} CREATE EXTERNAL TABLE bucketing_table1 (id INT) CLUSTERED BY (id) SORTED BY (id ASC) INTO 32 BUCKETS stored as textfile; INSERT INTO TABLE bucketing_table1 VALUES (1), (2), (3), (4), (5); CREATE EXTERNAL TABLE bucketing_table2 like bucketing_table1; INSERT INTO TABLE bucketing_table2 select * from bucketing_table1;{code} id=1 => murmur_hash(1) %32 should go to 29th bucket file. bucketing_table1 has id=1 at 29th file, but bucketing_table2 doesn't have 29th file because Insert-select dint honor the bucketing. {code:java} SELECT count(*) FROM bucketing_table1 WHERE id = 1; === 1 //correct result SELECT count(*) FROM bucketing_table2 WHERE id = 1; === 0 // incorrect result select *, INPUT__FILE__NAME from bucketing_table1; +--++ | bucketing_table1.id | input__file__name | +--++ | 2 | /bucketing_table1/04_0 | | 3 | /bucketing_table1/06_0 | | 5 | /bucketing_table1/15_0 | | 4 | /bucketing_table1/21_0 | | 1 | /bucketing_table1/29_0 | +--++ select *, INPUT__FILE__NAME from bucketing_table2; +-++ | bucketing_table2.id | input__file__name | +-++ | 2 | /bucketing_table2/00_0 | | 3 | /bucketing_table2/01_0 | | 5 | /bucketing_table2/02_0 | | 4 | /bucketing_table2/03_0 | | 1 | /bucketing_table2/04_0 | +--++{code} Query to identify in which bucketFile a particular row should be {code:java} with t as (select *, murmur_hash(id)%32 as bucket, INPUT__FILE__NAME from bucketing_table1) select id, (case when bucket > 0 then bucket else 32 + bucket end) as bucket_number, INPUT__FILE__NAME from t; +-+++ | id | bucket_number | input__file__name | +-+++ | 2 | 4 | /bucketing_table1/04_0 | | 3 | 6 | /bucketing_table1/06_0 | | 5 | 15 | /bucketing_table1/15_0 | | 4 | 21 | /bucketing_table1/21_0 | | 1 | 29 | /bucketing_table1/29_0 | +-+++{code} Workaround for read: hive.tez.bucket.pruning=false; PS: Attaching repro file [^test.q] was: Insert-select is not honoring bucketing if both source & target are bucketed on same column. eg., {code:java} CREATE EXTERNAL TABLE bucketing_table1 (id INT) CLUSTERED BY (id) SORTED BY (id ASC) INTO 32 BUCKETS stored as textfile; INSERT INTO TABLE bucketing_table1 VALUES (1), (2), (3), (4), (5); CREATE EXTERNAL TABLE bucketing_table2 like bucketing_table1; INSERT INTO TABLE bucketing_table2 select * from bucketing_table1;{code} id=1 => murmur_hash(1) %32 should go to 29th bucket file. bucketing_table1 has id=1 at 29th file, but bucketing_table2 doesn't have 29th file because Insert-select dint honor the bucketing. {code:java} SELECT count(*) FROM bucketing_table1 WHERE id = 1; === 1 //correct result SELECT count(*) FROM bucketing_table2 WHERE id = 1; === 0 // incorrect result select *, INPUT__FILE__NAME from bucketing_table1; +--++ | bucketing_table1.id | input__file__name | +--++ | 2 | /bucketing_table1/04_0 | | 3 | /bucketing_table1/06_0 | | 5 | /bucketing_table1/15_0 | | 4 | /bucketing_table1/21_0 | | 1 | /bucketing_table1/29_0 | +--++ select *, INPUT__FILE__NAME from bucketing_table2; +-++ | bucketing_table2.id | input__file__name | +-++ | 2 | /bucketing_table2/00_0 | | 3 | /bucketing_table2/01_0 | | 5
[jira] [Updated] (HIVE-28213) Incorrect results after insert-select from similar bucketed source & target table
[ https://issues.apache.org/jira/browse/HIVE-28213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-28213: -- Description: Insert-select is not honoring bucketing if both source & target are bucketed on same column. eg., {code:java} CREATE EXTERNAL TABLE bucketing_table1 (id INT) CLUSTERED BY (id) SORTED BY (id ASC) INTO 32 BUCKETS stored as textfile; INSERT INTO TABLE bucketing_table1 VALUES (1), (2), (3), (4), (5); CREATE EXTERNAL TABLE bucketing_table2 like bucketing_table1; INSERT INTO TABLE bucketing_table2 select * from bucketing_table1;{code} id=1 => murmur_hash(1) %32 should go to 29th bucket file. bucketing_table1 has id=1 at 29th file, but bucketing_table2 doesn't have 29th file because Insert-select dint honor the bucketing. {code:java} SELECT count(*) FROM bucketing_table1 WHERE id = 1; === 1 //correct result SELECT count(*) FROM bucketing_table2 WHERE id = 1; === 0 // incorrect result select *, INPUT__FILE__NAME from bucketing_table1; +--++ | bucketing_table1.id | input__file__name | +--++ | 2 | /bucketing_table1/04_0 | | 3 | /bucketing_table1/06_0 | | 5 | /bucketing_table1/15_0 | | 4 | /bucketing_table1/21_0 | | 1 | /bucketing_table1/29_0 | +--++ select *, INPUT__FILE__NAME from bucketing_table2; +-++ | bucketing_table2.id | input__file__name | +-++ | 2 | /bucketing_table2/00_0 | | 3 | /bucketing_table2/01_0 | | 5 | /bucketing_table2/02_0 | | 4 | /bucketing_table2/03_0 | | 1 | /bucketing_table2/04_0 | +--++{code} Workaround for read: hive.tez.bucket.pruning=false; PS: Attaching repro file [^test.q] was: Insert-select is not honoring bucketing if both source & target are bucketed on same column. eg., {code:java} CREATE EXTERNAL TABLE bucketing_table1 (id INT) CLUSTERED BY (id) SORTED BY (id ASC) INTO 32 BUCKETS stored as textfile; INSERT INTO TABLE bucketing_table1 VALUES (1), (2), (3), (4), (5); CREATE EXTERNAL TABLE bucketing_table2 like bucketing_table1; INSERT INTO TABLE bucketing_table2 select * from bucketing_table1;{code} id=1 => murmur_hash(1) %32 should go to 29th bucket file. bucketing_table1 has id=1 at 29th file, but bucketing_table2 doesn't have 29th file because Insert-select dint honor the bucketing. {code:java} SELECT count(*) FROM bucketing_table1 WHERE id = 1; === 1 //correct result SELECT count(*) FROM bucketing_table2 WHERE id = 1; === 0 // incorrect result select *, INPUT__FILE__NAME from bucketing_table1; +--++ | bucketing_table1.id | input__file__name | +--++ | 2 | /bucketing_table1/04_0 | | 3 | /bucketing_table1/06_0 | | 5 | /bucketing_table1/15_0 | | 4 | /bucketing_table1/21_0 | | 1 | /bucketing_table1/29_0 | +--++ select *, INPUT__FILE__NAME from bucketing_table2; +-++ | bucketing_table2.id | input__file__name | +-++ | 2 | /bucketing_table2/00_0 | | 3 | /bucketing_table2/01_0 | | 5 | /bucketing_table2/02_0 | | 4 | /bucketing_table2/03_0 | | 1 | /bucketing_table2/04_0 | +--++{code} Query to identify in which bucketFile a particular row should be {code:java} with t as (select *, murmur_hash(id)%32 as bucket, INPUT__FILE__NAME from bucketing_table1) select id, (case when bucket > 0 then bucket else 32 + bucket end) as bucket_number, INPUT__FILE__NAME from t; +-+++ | id | bucket_number | input__file__name | +-+++ | 2 | 4 | /bucketing_table1/04_0 | | 3 | 6 | /bucketing_table1/06_0 | | 5 | 15 |
[jira] [Commented] (HIVE-28117) add_months() with output_date_format returning wrong year on leap day
[ https://issues.apache.org/jira/browse/HIVE-28117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827257#comment-17827257 ] Naresh P R commented on HIVE-28117: --- Can you try your usecase with -MM format ? eg., select add_months(dt, -2, '-MM') > add_months() with output_date_format returning wrong year on leap day > - > > Key: HIVE-28117 > URL: https://issues.apache.org/jira/browse/HIVE-28117 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.3 >Reporter: Jesse Petre >Priority: Minor > Attachments: 2024-03-11_12-11-11.png > > > I use an output_date_format option on the add_months() function like so: > {{select add_months(dt, -2, '-MM')}} > On leap day, 2024-02-29, this incorrectly returned 2024-12. I expected > 2023-12. All other days it works fine, only leap day it gave the wrong > result. > > Omitting the output date format will make it calculate the date correctly. > Including the output date format gives the wrong result. -- This message was sent by Atlassian Jira (v8.20.10#820010)