[jira] [Updated] (HIVE-27428) CTAS fails with SemanticException when join subquery has complex type column and false filter predicate

2023-06-09 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-27428:
--
Description: 
Repro steps:
{code:java}
drop table if exists table1;
drop table if exists table2;

create table table1 (a string, b string);
create table table2 (complex_column  create table table2 (complex_column 
array, values:array);

-- CTAS failing query
create table table3 as with t1 as (select * from table1), t2 as (select * from 
table2 where 1=0) select t1.*, t2.* from t1 left join t2;{code}
Exception:
{code:java}
Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: 
CREATE-TABLE-AS-SELECT creates a VOID type, please use CAST to specify the 
type, near field:  t2.complex_column
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8171)
 
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8129)
 
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7822)
 
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11248)
 
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:11120)
 
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12050)
 
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11916)
 
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12730)
 
        at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:722)
 
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12831)
 
        at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:442)
 
        at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:300)
 
        at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220) 
        at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105) 
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:194)  {code}

  was:
Repro steps:
{code:java}
drop table if exists table1;
drop table if exists table2;

create table table1 (a string, b string);
create table table2 (complex_column  create table table2 (complex_column 
array, values:array);

-- CTAS failing query
create table table3 as with t1 as (select * from table1), t2 as (select * from 
table2 where 1=0) select t1.*, t2.* from t1 left join t2;{code}
Exception:
{code:java}
Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: 
CREATE-TABLE-AS-SELECT creates a VOID type, please use CAST to specify the 
type, near field:  t2.df0rrd_prod_wers_x
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8171)
 
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8129)
 
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7822)
 
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11248)
 
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:11120)
 
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12050)
 
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11916)
 
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12730)
 
        at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:722)
 
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12831)
 
        at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:442)
 
        at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:300)
 
        at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220) 
        at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105) 
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:194)  {code}


> CTAS fails with SemanticException when join subquery has complex type column 
> and false filter predicate
> ---
>
> Key: HIVE-27428
> URL: https://issues.apache.org/jira/browse/HIVE-27428
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Priority: Major
>
> Repro steps:
> {code:java}
> drop table if exists table1;
> drop table if exists table2;
> create table table1 (a string, b string);
> create table table2 

[jira] [Created] (HIVE-27428) CTAS fails with SemanticException when join subquery has complex type column and false filter predicate

2023-06-09 Thread Naresh P R (Jira)
Naresh P R created HIVE-27428:
-

 Summary: CTAS fails with SemanticException when join subquery has 
complex type column and false filter predicate
 Key: HIVE-27428
 URL: https://issues.apache.org/jira/browse/HIVE-27428
 Project: Hive
  Issue Type: Bug
Reporter: Naresh P R


Repro steps:
{code:java}
drop table if exists table1;
drop table if exists table2;

create table table1 (a string, b string);
create table table2 (complex_column  create table table2 (complex_column 
array, values:array);

-- CTAS failing query
create table table3 as with t1 as (select * from table1), t2 as (select * from 
table2 where 1=0) select t1.*, t2.* from t1 left join t2;{code}
Exception:
{code:java}
Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: 
CREATE-TABLE-AS-SELECT creates a VOID type, please use CAST to specify the 
type, near field:  t2.df0rrd_prod_wers_x
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8171)
 
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8129)
 
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7822)
 
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11248)
 
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:11120)
 
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12050)
 
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11916)
 
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12730)
 
        at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:722)
 
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12831)
 
        at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:442)
 
        at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:300)
 
        at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220) 
        at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105) 
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:194)  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27964) Support drop stats similar to Impala

2023-12-21 Thread Naresh P R (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799522#comment-17799522
 ] 

Naresh P R commented on HIVE-27964:
---

Partition table rename gets clogged at PART_COL_STATS for wide tables.
{code:java}
CREATE TABLE IF NOT EXISTS `PART_COL_STATS` (
 ...
 `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT 
NULL, 
  ...){code}
Since PART_COL_STATS holds db_name & table_name, incase of table rename, every 
row in PART_COL_STATS associated with the table should be fetched, stored in 
memory, dropped & re-added with new tableName.

Instead clearing the stats before rename & computing later would help to speed 
up the process.

 

Another optimization i was about to raise is to remove DB_NAME, TABLE_NAME, 
PARTITION_NAME from PART_COL_STATS & use PART_ID as FOREIGN KEY from PARTITIONS 
to avoid touching PART_COL_STATS for table/partition renames.

> Support drop stats similar to Impala
> 
>
> Key: HIVE-27964
> URL: https://issues.apache.org/jira/browse/HIVE-27964
> Project: Hive
>  Issue Type: New Feature
>Reporter: Naresh P R
>Priority: Major
>
> Hive should support drop stats similar to impala.
> https://impala.apache.org/docs/build/html/topics/impala_drop_stats.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-27964) Support drop stats similar to Impala

2023-12-21 Thread Naresh P R (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799522#comment-17799522
 ] 

Naresh P R edited comment on HIVE-27964 at 12/21/23 5:47 PM:
-

Partition table rename gets clogged at PART_COL_STATS for wide tables.
{code:java}
CREATE TABLE IF NOT EXISTS `PART_COL_STATS` (
 ...
 `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT 
NULL, 
  ...){code}
Since PART_COL_STATS holds db_name & table_name, incase of table rename, every 
row in PART_COL_STATS associated with the table should be fetched, stored in 
memory, delete & re-insert with new db/table/partition name.

Instead clearing the stats before rename & computing later would help to speed 
up the process.

 

Another optimization i was about to raise is to remove DB_NAME, TABLE_NAME, 
PARTITION_NAME from PART_COL_STATS & use PART_ID as FOREIGN KEY from PARTITIONS 
to avoid touching PART_COL_STATS for table/partition renames.


was (Author: nareshpr):
Partition table rename gets clogged at PART_COL_STATS for wide tables.
{code:java}
CREATE TABLE IF NOT EXISTS `PART_COL_STATS` (
 ...
 `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT 
NULL, 
  ...){code}
Since PART_COL_STATS holds db_name & table_name, incase of table rename, every 
row in PART_COL_STATS associated with the table should be fetched, stored in 
memory, dropped & re-added with new tableName.

Instead clearing the stats before rename & computing later would help to speed 
up the process.

 

Another optimization i was about to raise is to remove DB_NAME, TABLE_NAME, 
PARTITION_NAME from PART_COL_STATS & use PART_ID as FOREIGN KEY from PARTITIONS 
to avoid touching PART_COL_STATS for table/partition renames.

> Support drop stats similar to Impala
> 
>
> Key: HIVE-27964
> URL: https://issues.apache.org/jira/browse/HIVE-27964
> Project: Hive
>  Issue Type: New Feature
>Reporter: Naresh P R
>Priority: Major
>
> Hive should support drop stats similar to impala.
> https://impala.apache.org/docs/build/html/topics/impala_drop_stats.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27965) Table/partition rename takes a long time at PART_COL_STATS for wide tables

2023-12-21 Thread Naresh P R (Jira)
Naresh P R created HIVE-27965:
-

 Summary: Table/partition rename takes a long time at 
PART_COL_STATS for wide tables
 Key: HIVE-27965
 URL: https://issues.apache.org/jira/browse/HIVE-27965
 Project: Hive
  Issue Type: Improvement
Reporter: Naresh P R


Partition table rename gets clogged at PART_COL_STATS for wide tables.
{code:java}
CREATE TABLE IF NOT EXISTS `PART_COL_STATS` (
 ...
 `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT 
NULL, 
  ...){code}
Since PART_COL_STATS holds db_name & table_name, incase of table rename, every 
row in PART_COL_STATS associated with the table should be fetched, stored in 
memory, delete & re-insert with new db/table/partition name.

 

Remove DB_NAME, TABLE_NAME, PARTITION_NAME from PART_COL_STATS & use PART_ID as 
FOREIGN KEY from PARTITIONS to avoid touching PART_COL_STATS for 
table/partition renames.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27964) Support drop stats similar to Impala

2023-12-20 Thread Naresh P R (Jira)
Naresh P R created HIVE-27964:
-

 Summary: Support drop stats similar to Impala
 Key: HIVE-27964
 URL: https://issues.apache.org/jira/browse/HIVE-27964
 Project: Hive
  Issue Type: New Feature
Reporter: Naresh P R


Hive should support drop stats similar to impala.

https://impala.apache.org/docs/build/html/topics/impala_drop_stats.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27965) Table/partition rename takes a long time at PART_COL_STATS for wide tables

2023-12-21 Thread Naresh P R (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799649#comment-17799649
 ] 

Naresh P R commented on HIVE-27965:
---

[~zhangbutao] 

Yes, this helps. Thanks for letting me know.

> Table/partition rename takes a long time at PART_COL_STATS for wide tables
> --
>
> Key: HIVE-27965
> URL: https://issues.apache.org/jira/browse/HIVE-27965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Naresh P R
>Priority: Major
>
> Partition table rename gets clogged at PART_COL_STATS for wide tables.
> {code:java}
> CREATE TABLE IF NOT EXISTS `PART_COL_STATS` (
>  ...
>  `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
>  `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
>  `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT 
> NULL, 
>   ...){code}
> Since PART_COL_STATS holds db_name & table_name, incase of table rename, 
> every row in PART_COL_STATS associated with the table should be fetched, 
> stored in memory, delete & re-insert with new db/table/partition name.
>  
> Remove DB_NAME, TABLE_NAME, PARTITION_NAME from PART_COL_STATS, instead use 
> TBL_ID, DB_ID, PART_ID to avoid touching PART_COL_STATS for table/partition 
> renames.
> Also TBL_ID, DB_ID, PART_ID can be used for PART_COL_STATS INDEXING.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27965) Table/partition rename takes a long time at PART_COL_STATS for wide tables

2023-12-21 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-27965:
--
Description: 
Partition table rename gets clogged at PART_COL_STATS for wide tables.
{code:java}
CREATE TABLE IF NOT EXISTS `PART_COL_STATS` (
 ...
 `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT 
NULL, 
  ...){code}
Since PART_COL_STATS holds db_name & table_name, incase of table rename, every 
row in PART_COL_STATS associated with the table should be fetched, stored in 
memory, delete & re-insert with new db/table/partition name.

 

Remove DB_NAME, TABLE_NAME, PARTITION_NAME from PART_COL_STATS & use & use 
TBL_ID, DB_ID, PART_ID to avoid touching PART_COL_STATS for table/partition 
renames.

Also TBL_ID, DB_ID, PART_ID can be used for PART_COL_STATS INDEXING.

  was:
Partition table rename gets clogged at PART_COL_STATS for wide tables.
{code:java}
CREATE TABLE IF NOT EXISTS `PART_COL_STATS` (
 ...
 `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT 
NULL, 
  ...){code}
Since PART_COL_STATS holds db_name & table_name, incase of table rename, every 
row in PART_COL_STATS associated with the table should be fetched, stored in 
memory, delete & re-insert with new db/table/partition name.

 

Remove DB_NAME, TABLE_NAME, PARTITION_NAME from PART_COL_STATS & use PART_ID as 
FOREIGN KEY from PARTITIONS to avoid touching PART_COL_STATS for 
table/partition renames.


> Table/partition rename takes a long time at PART_COL_STATS for wide tables
> --
>
> Key: HIVE-27965
> URL: https://issues.apache.org/jira/browse/HIVE-27965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Naresh P R
>Priority: Major
>
> Partition table rename gets clogged at PART_COL_STATS for wide tables.
> {code:java}
> CREATE TABLE IF NOT EXISTS `PART_COL_STATS` (
>  ...
>  `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
>  `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
>  `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT 
> NULL, 
>   ...){code}
> Since PART_COL_STATS holds db_name & table_name, incase of table rename, 
> every row in PART_COL_STATS associated with the table should be fetched, 
> stored in memory, delete & re-insert with new db/table/partition name.
>  
> Remove DB_NAME, TABLE_NAME, PARTITION_NAME from PART_COL_STATS & use & use 
> TBL_ID, DB_ID, PART_ID to avoid touching PART_COL_STATS for table/partition 
> renames.
> Also TBL_ID, DB_ID, PART_ID can be used for PART_COL_STATS INDEXING.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27965) Table/partition rename takes a long time at PART_COL_STATS for wide tables

2023-12-21 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-27965:
--
Description: 
Partition table rename gets clogged at PART_COL_STATS for wide tables.
{code:java}
CREATE TABLE IF NOT EXISTS `PART_COL_STATS` (
 ...
 `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT 
NULL, 
  ...){code}
Since PART_COL_STATS holds db_name & table_name, incase of table rename, every 
row in PART_COL_STATS associated with the table should be fetched, stored in 
memory, delete & re-insert with new db/table/partition name.

 

Remove DB_NAME, TABLE_NAME, PARTITION_NAME from PART_COL_STATS, instead use 
TBL_ID, DB_ID, PART_ID to avoid touching PART_COL_STATS for table/partition 
renames.

Also TBL_ID, DB_ID, PART_ID can be used for PART_COL_STATS INDEXING.

  was:
Partition table rename gets clogged at PART_COL_STATS for wide tables.
{code:java}
CREATE TABLE IF NOT EXISTS `PART_COL_STATS` (
 ...
 `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT 
NULL, 
  ...){code}
Since PART_COL_STATS holds db_name & table_name, incase of table rename, every 
row in PART_COL_STATS associated with the table should be fetched, stored in 
memory, delete & re-insert with new db/table/partition name.

 

Remove DB_NAME, TABLE_NAME, PARTITION_NAME from PART_COL_STATS & use & use 
TBL_ID, DB_ID, PART_ID to avoid touching PART_COL_STATS for table/partition 
renames.

Also TBL_ID, DB_ID, PART_ID can be used for PART_COL_STATS INDEXING.


> Table/partition rename takes a long time at PART_COL_STATS for wide tables
> --
>
> Key: HIVE-27965
> URL: https://issues.apache.org/jira/browse/HIVE-27965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Naresh P R
>Priority: Major
>
> Partition table rename gets clogged at PART_COL_STATS for wide tables.
> {code:java}
> CREATE TABLE IF NOT EXISTS `PART_COL_STATS` (
>  ...
>  `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
>  `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
>  `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT 
> NULL, 
>   ...){code}
> Since PART_COL_STATS holds db_name & table_name, incase of table rename, 
> every row in PART_COL_STATS associated with the table should be fetched, 
> stored in memory, delete & re-insert with new db/table/partition name.
>  
> Remove DB_NAME, TABLE_NAME, PARTITION_NAME from PART_COL_STATS, instead use 
> TBL_ID, DB_ID, PART_ID to avoid touching PART_COL_STATS for table/partition 
> renames.
> Also TBL_ID, DB_ID, PART_ID can be used for PART_COL_STATS INDEXING.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-27964) Support drop stats similar to Impala

2023-12-21 Thread Naresh P R (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799522#comment-17799522
 ] 

Naresh P R edited comment on HIVE-27964 at 12/21/23 6:17 PM:
-

Partition table rename gets clogged at PART_COL_STATS for wide tables.
{code:java}
CREATE TABLE IF NOT EXISTS `PART_COL_STATS` (
 ...
 `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT 
NULL, 
  ...){code}
Since PART_COL_STATS holds db_name & table_name, incase of table rename, every 
row in PART_COL_STATS associated with the table should be fetched, stored in 
memory, delete & re-insert with new db/table/partition name.

Instead clearing the stats before rename & computing later would help to speed 
up the process.

 

Just raised another optimization HIVE-27965, to remove DB_NAME, TABLE_NAME, 
PARTITION_NAME from PART_COL_STATS & use TBL_ID, DB_ID, PART_ID to avoid 
touching PART_COL_STATS for table/partition renames + can be used in indexes as 
well.


was (Author: nareshpr):
Partition table rename gets clogged at PART_COL_STATS for wide tables.
{code:java}
CREATE TABLE IF NOT EXISTS `PART_COL_STATS` (
 ...
 `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT 
NULL, 
  ...){code}
Since PART_COL_STATS holds db_name & table_name, incase of table rename, every 
row in PART_COL_STATS associated with the table should be fetched, stored in 
memory, delete & re-insert with new db/table/partition name.

Instead clearing the stats before rename & computing later would help to speed 
up the process.

 

Another optimization i was about to raise is to remove DB_NAME, TABLE_NAME, 
PARTITION_NAME from PART_COL_STATS & use TBL_ID, DB_ID, PART_ID to avoid 
touching PART_COL_STATS for table/partition renames + can be used in indexes as 
well.

> Support drop stats similar to Impala
> 
>
> Key: HIVE-27964
> URL: https://issues.apache.org/jira/browse/HIVE-27964
> Project: Hive
>  Issue Type: New Feature
>Reporter: Naresh P R
>Priority: Major
>
> Hive should support drop stats similar to impala.
> https://impala.apache.org/docs/build/html/topics/impala_drop_stats.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-27964) Support drop stats similar to Impala

2023-12-21 Thread Naresh P R (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799522#comment-17799522
 ] 

Naresh P R edited comment on HIVE-27964 at 12/21/23 6:10 PM:
-

Partition table rename gets clogged at PART_COL_STATS for wide tables.
{code:java}
CREATE TABLE IF NOT EXISTS `PART_COL_STATS` (
 ...
 `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT 
NULL, 
  ...){code}
Since PART_COL_STATS holds db_name & table_name, incase of table rename, every 
row in PART_COL_STATS associated with the table should be fetched, stored in 
memory, delete & re-insert with new db/table/partition name.

Instead clearing the stats before rename & computing later would help to speed 
up the process.

 

Another optimization i was about to raise is to remove DB_NAME, TABLE_NAME, 
PARTITION_NAME from PART_COL_STATS & use TBL_ID, DB_ID, PART_ID to avoid 
touching PART_COL_STATS for table/partition renames + can be used in indexes as 
well.


was (Author: nareshpr):
Partition table rename gets clogged at PART_COL_STATS for wide tables.
{code:java}
CREATE TABLE IF NOT EXISTS `PART_COL_STATS` (
 ...
 `DB_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `TABLE_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `PARTITION_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT 
NULL, 
  ...){code}
Since PART_COL_STATS holds db_name & table_name, incase of table rename, every 
row in PART_COL_STATS associated with the table should be fetched, stored in 
memory, delete & re-insert with new db/table/partition name.

Instead clearing the stats before rename & computing later would help to speed 
up the process.

 

Another optimization i was about to raise is to remove DB_NAME, TABLE_NAME, 
PARTITION_NAME from PART_COL_STATS & use PART_ID as FOREIGN KEY from PARTITIONS 
to avoid touching PART_COL_STATS for table/partition renames.

> Support drop stats similar to Impala
> 
>
> Key: HIVE-27964
> URL: https://issues.apache.org/jira/browse/HIVE-27964
> Project: Hive
>  Issue Type: New Feature
>Reporter: Naresh P R
>Priority: Major
>
> Hive should support drop stats similar to impala.
> https://impala.apache.org/docs/build/html/topics/impala_drop_stats.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27885) Cast decimal from string with space without digits before dot returns NULL

2023-11-24 Thread Naresh P R (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789609#comment-17789609
 ] 

Naresh P R commented on HIVE-27885:
---

Thank you [~rameshkumar] & [~ngangam] for the review and commit.

> Cast decimal from string with space without digits before dot returns NULL
> --
>
> Key: HIVE-27885
> URL: https://issues.apache.org/jira/browse/HIVE-27885
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> eg.,
> select cast(". " as decimal(8,4))
> {code:java}
> – Expected output
> 0.
> – Actual output
> NULL
> {code}
> select cast("0. " as decimal(8,4))
> {code:java}
> – Actual output
> 0.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27876) Incorrect query results on tables with ClusterBy & SortBy

2023-11-15 Thread Naresh P R (Jira)
Naresh P R created HIVE-27876:
-

 Summary: Incorrect query results on tables with ClusterBy & SortBy
 Key: HIVE-27876
 URL: https://issues.apache.org/jira/browse/HIVE-27876
 Project: Hive
  Issue Type: Bug
Reporter: Naresh P R


Repro:

 
{code:java}
create external table test_bucket(age int, name string, dept string) clustered 
by (age, name) sorted by (age asc, name asc) into 2 buckets stored as orc;
insert into test_bucket values (1, 'user1', 'dept1'), ( 2, 'user2' , 'dept2');
insert into test_bucket values (1, 'user1', 'dept1'), ( 2, 'user2' , 'dept2');

//empty wrong results with default CDP configs
select age, name, count(*) from test_bucket group by  age, name having count(*) 
> 1; 
+--+---+--+
| age  | name  | _c2  |
+--+---+--+
+--+---+--+

// Workaround
set hive.map.aggr=false;
select age, name, count(*) from test_bucket group by  age, name having count(*) 
> 1; 
+--++--+
| age  |  name  | _c2  |
+--++--+
| 1    | user1  | 2    |
| 2    | user2  | 2    |
+--++--+ {code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27876) Incorrect query results on tables with ClusterBy & SortBy

2023-11-15 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-27876:
--
Description: 
Repro:

 
{code:java}
create external table test_bucket(age int, name string, dept string) clustered 
by (age, name) sorted by (age asc, name asc) into 2 buckets stored as orc;
insert into test_bucket values (1, 'user1', 'dept1'), ( 2, 'user2' , 'dept2');
insert into test_bucket values (1, 'user1', 'dept1'), ( 2, 'user2' , 'dept2');

//empty wrong results
select age, name, count(*) from test_bucket group by  age, name having count(*) 
> 1; 
+--+---+--+
| age  | name  | _c2  |
+--+---+--+
+--+---+--+

// Workaround
set hive.map.aggr=false;
select age, name, count(*) from test_bucket group by  age, name having count(*) 
> 1; 
+--++--+
| age  |  name  | _c2  |
+--++--+
| 1    | user1  | 2    |
| 2    | user2  | 2    |
+--++--+ {code}
 

 

  was:
Repro:

 
{code:java}
create external table test_bucket(age int, name string, dept string) clustered 
by (age, name) sorted by (age asc, name asc) into 2 buckets stored as orc;
insert into test_bucket values (1, 'user1', 'dept1'), ( 2, 'user2' , 'dept2');
insert into test_bucket values (1, 'user1', 'dept1'), ( 2, 'user2' , 'dept2');

//empty wrong results with default CDP configs
select age, name, count(*) from test_bucket group by  age, name having count(*) 
> 1; 
+--+---+--+
| age  | name  | _c2  |
+--+---+--+
+--+---+--+

// Workaround
set hive.map.aggr=false;
select age, name, count(*) from test_bucket group by  age, name having count(*) 
> 1; 
+--++--+
| age  |  name  | _c2  |
+--++--+
| 1    | user1  | 2    |
| 2    | user2  | 2    |
+--++--+ {code}
 

 


> Incorrect query results on tables with ClusterBy & SortBy
> -
>
> Key: HIVE-27876
> URL: https://issues.apache.org/jira/browse/HIVE-27876
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Priority: Major
>
> Repro:
>  
> {code:java}
> create external table test_bucket(age int, name string, dept string) 
> clustered by (age, name) sorted by (age asc, name asc) into 2 buckets stored 
> as orc;
> insert into test_bucket values (1, 'user1', 'dept1'), ( 2, 'user2' , 'dept2');
> insert into test_bucket values (1, 'user1', 'dept1'), ( 2, 'user2' , 'dept2');
> //empty wrong results
> select age, name, count(*) from test_bucket group by  age, name having 
> count(*) > 1; 
> +--+---+--+
> | age  | name  | _c2  |
> +--+---+--+
> +--+---+--+
> // Workaround
> set hive.map.aggr=false;
> select age, name, count(*) from test_bucket group by  age, name having 
> count(*) > 1; 
> +--++--+
> | age  |  name  | _c2  |
> +--++--+
> | 1    | user1  | 2    |
> | 2    | user2  | 2    |
> +--++--+ {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27885) Cast decimal from string with space without digits before dot returns NULL

2023-11-17 Thread Naresh P R (Jira)
Naresh P R created HIVE-27885:
-

 Summary: Cast decimal from string with space without digits before 
dot returns NULL
 Key: HIVE-27885
 URL: https://issues.apache.org/jira/browse/HIVE-27885
 Project: Hive
  Issue Type: Bug
 Environment: eg.,

select cast(". " as decimal(8,4))

-- Expected output

0.

-- Actual output

NULL

 

select cast("0. " as decimal(8,4))

-- Actual output

0.
Reporter: Naresh P R
Assignee: Naresh P R






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27885) Cast decimal from string with space without digits before dot returns NULL

2023-11-17 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-27885:
--
Environment: (was: eg.,

select cast(". " as decimal(8,4))

-- Expected output

0.

-- Actual output

NULL

 

select cast("0. " as decimal(8,4))

-- Actual output

0.)

> Cast decimal from string with space without digits before dot returns NULL
> --
>
> Key: HIVE-27885
> URL: https://issues.apache.org/jira/browse/HIVE-27885
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27885) Cast decimal from string with space without digits before dot returns NULL

2023-11-17 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-27885:
--
Description: 
eg.,

select cast(". " as decimal(8,4))
{code:java}
– Expected output
0.
– Actual output
NULL
{code}
select cast("0. " as decimal(8,4))
{code:java}
– Actual output
0.
{code}

  was:
eg.,

select cast(". " as decimal(8,4))

 
{code:java}
– Expected output
0.
– Actual output
NULL
{code}
select cast("0. " as decimal(8,4))

 
{code:java}
– Actual output
0.
{code}
 


> Cast decimal from string with space without digits before dot returns NULL
> --
>
> Key: HIVE-27885
> URL: https://issues.apache.org/jira/browse/HIVE-27885
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>
> eg.,
> select cast(". " as decimal(8,4))
> {code:java}
> – Expected output
> 0.
> – Actual output
> NULL
> {code}
> select cast("0. " as decimal(8,4))
> {code:java}
> – Actual output
> 0.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27885) Cast decimal from string with space without digits before dot returns NULL

2023-11-17 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-27885:
--
Description: 
eg.,

select cast(". " as decimal(8,4))

 
{code:java}
– Expected output
0.
– Actual output
NULL
{code}
select cast("0. " as decimal(8,4))

 
{code:java}
– Actual output
0.
{code}
 

> Cast decimal from string with space without digits before dot returns NULL
> --
>
> Key: HIVE-27885
> URL: https://issues.apache.org/jira/browse/HIVE-27885
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>
> eg.,
> select cast(". " as decimal(8,4))
>  
> {code:java}
> – Expected output
> 0.
> – Actual output
> NULL
> {code}
> select cast("0. " as decimal(8,4))
>  
> {code:java}
> – Actual output
> 0.
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28213) Incorrect results after insert-select from similar bucketed source & target table

2024-04-24 Thread Naresh P R (Jira)
Naresh P R created HIVE-28213:
-

 Summary: Incorrect results after insert-select from similar 
bucketed source & target table
 Key: HIVE-28213
 URL: https://issues.apache.org/jira/browse/HIVE-28213
 Project: Hive
  Issue Type: Bug
Reporter: Naresh P R
 Attachments: test.q

Insert-select is not honoring bucketing if both source & target are bucketed on 
same column.

eg., 
{code:java}
CREATE EXTERNAL TABLE bucketing_table1 (id INT)
CLUSTERED BY (id)
SORTED BY (id ASC)
INTO 32 BUCKETS stored as textfile;

INSERT INTO TABLE bucketing_table1 VALUES (1), (2), (3), (4), (5);

CREATE EXTERNAL TABLE bucketing_table2 like bucketing_table1;

INSERT INTO TABLE bucketing_table2 select * from bucketing_table1;{code}
id=1 => murmur_hash(1) %32 should go to 29th bucket file.

bucketing_table1 has id=1 at 29th file,

but bucketing_table2 doesn't have 29th file because Insert-select dint honor 
the bucketing.
{code:java}
SELECT count(*) FROM bucketing_table1 WHERE id = 1;
===
1 //correct result

SELECT count(*) FROM bucketing_table2 WHERE id = 1;  
===
0 // incorrect result{code}
Workaround: hive.tez.bucket.pruning=false;

PS: Attaching repro file [^test.q]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28213) Incorrect results after insert-select from similar bucketed source & target table

2024-04-24 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-28213:
--
Description: 
Insert-select is not honoring bucketing if both source & target are bucketed on 
same column.

eg., 
{code:java}
CREATE EXTERNAL TABLE bucketing_table1 (id INT)
CLUSTERED BY (id)
SORTED BY (id ASC)
INTO 32 BUCKETS stored as textfile;

INSERT INTO TABLE bucketing_table1 VALUES (1), (2), (3), (4), (5);

CREATE EXTERNAL TABLE bucketing_table2 like bucketing_table1;

INSERT INTO TABLE bucketing_table2 select * from bucketing_table1;{code}
id=1 => murmur_hash(1) %32 should go to 29th bucket file.

bucketing_table1 has id=1 at 29th file,

but bucketing_table2 doesn't have 29th file because Insert-select dint honor 
the bucketing.
{code:java}
SELECT count(*) FROM bucketing_table1 WHERE id = 1;
===
1 //correct result

SELECT count(*) FROM bucketing_table2 WHERE id = 1;   
=== 
0 // incorrect result


select *, INPUT__FILE__NAME from bucketing_table1;
+--++
| bucketing_table1.id  |                 input__file__name                  |
+--++
| 2                    | /bucketing_table1/04_0 |
| 3                    | /bucketing_table1/06_0 |
| 5                    | /bucketing_table1/15_0 |
| 4                    | /bucketing_table1/21_0 |
| 1                    | /bucketing_table1/29_0 |
+--++

select *, INPUT__FILE__NAME from bucketing_table2;
+-++
| bucketing_table2.id  |                 input__file__name                  |
+-++
| 2           | /bucketing_table2/00_0 |
| 3           | /bucketing_table2/01_0 |
| 5           | /bucketing_table2/02_0 |
| 4           | /bucketing_table2/03_0 |
| 1           | /bucketing_table2/04_0 |
+--++{code}
Workaround for read: hive.tez.bucket.pruning=false;

PS: Attaching repro file [^test.q]

  was:
Insert-select is not honoring bucketing if both source & target are bucketed on 
same column.

eg., 
{code:java}
CREATE EXTERNAL TABLE bucketing_table1 (id INT)
CLUSTERED BY (id)
SORTED BY (id ASC)
INTO 32 BUCKETS stored as textfile;

INSERT INTO TABLE bucketing_table1 VALUES (1), (2), (3), (4), (5);

CREATE EXTERNAL TABLE bucketing_table2 like bucketing_table1;

INSERT INTO TABLE bucketing_table2 select * from bucketing_table1;{code}
id=1 => murmur_hash(1) %32 should go to 29th bucket file.

bucketing_table1 has id=1 at 29th file,

but bucketing_table2 doesn't have 29th file because Insert-select dint honor 
the bucketing.
{code:java}
SELECT count(*) FROM bucketing_table1 WHERE id = 1;
===
1 //correct result

SELECT count(*) FROM bucketing_table2 WHERE id = 1;  
===
0 // incorrect result{code}
Workaround: hive.tez.bucket.pruning=false;

PS: Attaching repro file [^test.q]


> Incorrect results after insert-select from similar bucketed source & target 
> table
> -
>
> Key: HIVE-28213
> URL: https://issues.apache.org/jira/browse/HIVE-28213
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Priority: Major
> Attachments: test.q
>
>
> Insert-select is not honoring bucketing if both source & target are bucketed 
> on same column.
> eg., 
> {code:java}
> CREATE EXTERNAL TABLE bucketing_table1 (id INT)
> CLUSTERED BY (id)
> SORTED BY (id ASC)
> INTO 32 BUCKETS stored as textfile;
> INSERT INTO TABLE bucketing_table1 VALUES (1), (2), (3), (4), (5);
> CREATE EXTERNAL TABLE bucketing_table2 like bucketing_table1;
> INSERT INTO TABLE bucketing_table2 select * from bucketing_table1;{code}
> id=1 => murmur_hash(1) %32 should go to 29th bucket file.
> bucketing_table1 has id=1 at 29th file,
> but bucketing_table2 doesn't have 29th file because Insert-select dint honor 
> the bucketing.
> {code:java}
> SELECT count(*) FROM bucketing_table1 WHERE id = 1;
> ===
> 1 //correct result
> SELECT count(*) FROM bucketing_table2 WHERE id = 1;   
> === 
> 0 // incorrect result
> select *, INPUT__FILE__NAME from bucketing_table1;
> +--++
> | bucketing_table1.id  |                 input__file__name                  |
> +--++
> | 2                    | /bucketing_table1/04_0 |
> | 3                    | /bucketing_table1/06_0 |
> | 5                    | /bucketing_table1/15_0 |
> | 4                    | /bucketing_table1/21_0 |
> | 1 

[jira] [Updated] (HIVE-28213) Incorrect results after insert-select from similar bucketed source & target table

2024-04-24 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-28213:
--
Description: 
Insert-select is not honoring bucketing if both source & target are bucketed on 
same column.

eg., 
{code:java}
CREATE EXTERNAL TABLE bucketing_table1 (id INT)
CLUSTERED BY (id)
SORTED BY (id ASC)
INTO 32 BUCKETS stored as textfile;

INSERT INTO TABLE bucketing_table1 VALUES (1), (2), (3), (4), (5);

CREATE EXTERNAL TABLE bucketing_table2 like bucketing_table1;

INSERT INTO TABLE bucketing_table2 select * from bucketing_table1;{code}
id=1 => murmur_hash(1) %32 should go to 29th bucket file.

bucketing_table1 has id=1 at 29th file,

but bucketing_table2 doesn't have 29th file because Insert-select dint honor 
the bucketing.
{code:java}
SELECT count(*) FROM bucketing_table1 WHERE id = 1;
===
1 //correct result

SELECT count(*) FROM bucketing_table2 WHERE id = 1;   
=== 
0 // incorrect result


select *, INPUT__FILE__NAME from bucketing_table1;
+--++
| bucketing_table1.id  |                 input__file__name                  |
+--++
| 2                    | /bucketing_table1/04_0 |
| 3                    | /bucketing_table1/06_0 |
| 5                    | /bucketing_table1/15_0 |
| 4                    | /bucketing_table1/21_0 |
| 1                    | /bucketing_table1/29_0 |
+--++

select *, INPUT__FILE__NAME from bucketing_table2;
+-++
| bucketing_table2.id  |                 input__file__name                  |
+-++
| 2           | /bucketing_table2/00_0 |
| 3           | /bucketing_table2/01_0 |
| 5           | /bucketing_table2/02_0 |
| 4           | /bucketing_table2/03_0 |
| 1           | /bucketing_table2/04_0 |
+--++{code}
Query to identify in which bucketFile a particular row should be

 
{code:java}
with t as (select *, murmur_hash(id)%32 as bucket, INPUT__FILE__NAME from 
bucketing_table1)
select id, (case when bucket > 0 then bucket else 32 + bucket end) as 
bucket_number, INPUT__FILE__NAME from t; 
+-+++
| id  | bucket_number  |                 input__file__name                  |
+-+++
| 2   | 4              | /bucketing_table1/04_0 |
| 3   | 6              | /bucketing_table1/06_0 |
| 5   | 15             | /bucketing_table1/15_0 |
| 4   | 21             | /bucketing_table1/21_0 |
| 1   | 29             | /bucketing_table1/29_0 |
+-+++{code}
 

Workaround for read: hive.tez.bucket.pruning=false;

PS: Attaching repro file [^test.q]

  was:
Insert-select is not honoring bucketing if both source & target are bucketed on 
same column.

eg., 
{code:java}
CREATE EXTERNAL TABLE bucketing_table1 (id INT)
CLUSTERED BY (id)
SORTED BY (id ASC)
INTO 32 BUCKETS stored as textfile;

INSERT INTO TABLE bucketing_table1 VALUES (1), (2), (3), (4), (5);

CREATE EXTERNAL TABLE bucketing_table2 like bucketing_table1;

INSERT INTO TABLE bucketing_table2 select * from bucketing_table1;{code}
id=1 => murmur_hash(1) %32 should go to 29th bucket file.

bucketing_table1 has id=1 at 29th file,

but bucketing_table2 doesn't have 29th file because Insert-select dint honor 
the bucketing.
{code:java}
SELECT count(*) FROM bucketing_table1 WHERE id = 1;
===
1 //correct result

SELECT count(*) FROM bucketing_table2 WHERE id = 1;   
=== 
0 // incorrect result


select *, INPUT__FILE__NAME from bucketing_table1;
+--++
| bucketing_table1.id  |                 input__file__name                  |
+--++
| 2                    | /bucketing_table1/04_0 |
| 3                    | /bucketing_table1/06_0 |
| 5                    | /bucketing_table1/15_0 |
| 4                    | /bucketing_table1/21_0 |
| 1                    | /bucketing_table1/29_0 |
+--++

select *, INPUT__FILE__NAME from bucketing_table2;
+-++
| bucketing_table2.id  |                 input__file__name                  |
+-++
| 2           | /bucketing_table2/00_0 |
| 3           | /bucketing_table2/01_0 |
| 5     

[jira] [Updated] (HIVE-28213) Incorrect results after insert-select from similar bucketed source & target table

2024-04-24 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-28213:
--
Description: 
Insert-select is not honoring bucketing if both source & target are bucketed on 
same column.

eg., 
{code:java}
CREATE EXTERNAL TABLE bucketing_table1 (id INT)
CLUSTERED BY (id)
SORTED BY (id ASC)
INTO 32 BUCKETS stored as textfile;

INSERT INTO TABLE bucketing_table1 VALUES (1), (2), (3), (4), (5);

CREATE EXTERNAL TABLE bucketing_table2 like bucketing_table1;

INSERT INTO TABLE bucketing_table2 select * from bucketing_table1;{code}
id=1 => murmur_hash(1) %32 should go to 29th bucket file.

bucketing_table1 has id=1 at 29th file,

but bucketing_table2 doesn't have 29th file because Insert-select dint honor 
the bucketing.
{code:java}
SELECT count(*) FROM bucketing_table1 WHERE id = 1;
===
1 //correct result

SELECT count(*) FROM bucketing_table2 WHERE id = 1;   
=== 
0 // incorrect result


select *, INPUT__FILE__NAME from bucketing_table1;
+--++
| bucketing_table1.id  |                 input__file__name                  |
+--++
| 2                    | /bucketing_table1/04_0 |
| 3                    | /bucketing_table1/06_0 |
| 5                    | /bucketing_table1/15_0 |
| 4                    | /bucketing_table1/21_0 |
| 1                    | /bucketing_table1/29_0 |
+--++

select *, INPUT__FILE__NAME from bucketing_table2;
+-++
| bucketing_table2.id  |                 input__file__name                  |
+-++
| 2           | /bucketing_table2/00_0 |
| 3           | /bucketing_table2/01_0 |
| 5           | /bucketing_table2/02_0 |
| 4           | /bucketing_table2/03_0 |
| 1           | /bucketing_table2/04_0 |
+--++{code}
Workaround for read: hive.tez.bucket.pruning=false;

PS: Attaching repro file [^test.q]

  was:
Insert-select is not honoring bucketing if both source & target are bucketed on 
same column.

eg., 
{code:java}
CREATE EXTERNAL TABLE bucketing_table1 (id INT)
CLUSTERED BY (id)
SORTED BY (id ASC)
INTO 32 BUCKETS stored as textfile;

INSERT INTO TABLE bucketing_table1 VALUES (1), (2), (3), (4), (5);

CREATE EXTERNAL TABLE bucketing_table2 like bucketing_table1;

INSERT INTO TABLE bucketing_table2 select * from bucketing_table1;{code}
id=1 => murmur_hash(1) %32 should go to 29th bucket file.

bucketing_table1 has id=1 at 29th file,

but bucketing_table2 doesn't have 29th file because Insert-select dint honor 
the bucketing.
{code:java}
SELECT count(*) FROM bucketing_table1 WHERE id = 1;
===
1 //correct result

SELECT count(*) FROM bucketing_table2 WHERE id = 1;   
=== 
0 // incorrect result


select *, INPUT__FILE__NAME from bucketing_table1;
+--++
| bucketing_table1.id  |                 input__file__name                  |
+--++
| 2                    | /bucketing_table1/04_0 |
| 3                    | /bucketing_table1/06_0 |
| 5                    | /bucketing_table1/15_0 |
| 4                    | /bucketing_table1/21_0 |
| 1                    | /bucketing_table1/29_0 |
+--++

select *, INPUT__FILE__NAME from bucketing_table2;
+-++
| bucketing_table2.id  |                 input__file__name                  |
+-++
| 2           | /bucketing_table2/00_0 |
| 3           | /bucketing_table2/01_0 |
| 5           | /bucketing_table2/02_0 |
| 4           | /bucketing_table2/03_0 |
| 1           | /bucketing_table2/04_0 |
+--++{code}
Query to identify in which bucketFile a particular row should be

 
{code:java}
with t as (select *, murmur_hash(id)%32 as bucket, INPUT__FILE__NAME from 
bucketing_table1)
select id, (case when bucket > 0 then bucket else 32 + bucket end) as 
bucket_number, INPUT__FILE__NAME from t; 
+-+++
| id  | bucket_number  |                 input__file__name                  |
+-+++
| 2   | 4              | /bucketing_table1/04_0 |
| 3   | 6              | /bucketing_table1/06_0 |
| 5   | 15             | 

[jira] [Commented] (HIVE-28117) add_months() with output_date_format returning wrong year on leap day

2024-03-14 Thread Naresh P R (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827257#comment-17827257
 ] 

Naresh P R commented on HIVE-28117:
---

Can you try your usecase with -MM format ?

eg.,

select add_months(dt, -2, '-MM')

> add_months() with output_date_format returning wrong year on leap day
> -
>
> Key: HIVE-28117
> URL: https://issues.apache.org/jira/browse/HIVE-28117
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.3
>Reporter: Jesse Petre
>Priority: Minor
> Attachments: 2024-03-11_12-11-11.png
>
>
> I use an output_date_format option on the add_months() function like so:
> {{select add_months(dt, -2, '-MM')}}
> On leap day, 2024-02-29, this incorrectly returned 2024-12.  I expected 
> 2023-12.  All other days it works fine, only leap day it gave the wrong 
> result.
>  
> Omitting the output date format will make it calculate the date correctly.  
> Including the output date format gives the wrong result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


<    1   2   3   4