date:20231005

[jira] [Created] (HIVE-27771) Iceberg: Allow expire snapshot by time range

2023-10-05 Thread Ayush Saxena (Jira)

Ayush Saxena created HIVE-27771:
---

 Summary: Iceberg: Allow expire snapshot by time range
 Key: HIVE-27771
 URL: https://issues.apache.org/jira/browse/HIVE-27771
 Project: Hive
  Issue Type: Improvement
Reporter: Ayush Saxena
Assignee: Ayush Saxena


Allow expiring snapshot by time range.

Alter table ice01 execute expire_snapshot BETWEEN 'some time' AND 'some time'



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27771) Iceberg: Allow expire snapshot by time range

2023-10-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27771:
--
Labels: pull-request-available  (was: )

> Iceberg: Allow expire snapshot by time range
> 
>
> Key: HIVE-27771
> URL: https://issues.apache.org/jira/browse/HIVE-27771
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>
> Allow expiring snapshot by time range.
> Alter table ice01 execute expire_snapshot BETWEEN 'some time' AND 'some time'



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-26455) Remove PowerMockito from hive-exec

2023-10-05 Thread Sourabh Badhya (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sourabh Badhya resolved HIVE-26455.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Merged to master.
Thanks [~InvisibleProgrammer] for the contribution and [~ayushtkn] , [~rkirtir] 
, [~zratkai] for the reviews.

> Remove PowerMockito from hive-exec
> --
>
> Key: HIVE-26455
> URL: https://issues.apache.org/jira/browse/HIVE-26455
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Zsolt Miskolczi
>Assignee: Zsolt Miskolczi
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> PowerMockito is a mockito extension that introduces some painful points. 
> The main intention behind that is to be able to do static mocking. Since its 
> release, mockito-inline has been released, as a part of the mockito-core. 
> It doesn't require vintage test runner to be able to run and it can mock 
> objects with their own thread. 
> The goal is to stop using PowerMockito and use mockito-inline instead.
>  
> The affected packages are: 
>  * org.apache.hadoop.hive.ql.exec.repl
>  * org.apache.hadoop.hive.ql.exec.repl.bootstrap.load
>  * org.apache.hadoop.hive.ql.exec.repl.ranger;
>  * org.apache.hadoop.hive.ql.exec.util
>  * org.apache.hadoop.hive.ql.parse.repl
>  * org.apache.hadoop.hive.ql.parse.repl.load.message
>  * org.apache.hadoop.hive.ql.parse.repl.metric
>  * org.apache.hadoop.hive.ql.txn.compactor
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27772) Hive UNIX_TIMESTAMP() not returning null for invalid dates

2023-10-05 Thread Simhadri Govindappa (Jira)

Simhadri Govindappa created HIVE-27772:
--

 Summary: Hive UNIX_TIMESTAMP() not returning null for invalid dates
 Key: HIVE-27772
 URL: https://issues.apache.org/jira/browse/HIVE-27772
 Project: Hive
  Issue Type: Bug
Reporter: Simhadri Govindappa
Assignee: Simhadri Govindappa






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27772) Hive UNIX_TIMESTAMP() not returning null for invalid dates

2023-10-05 Thread Simhadri Govindappa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated HIVE-27772:
---
Description: 
{noformat}
6: jdbc:hive2://localhost:10001/> select month, datetimestamp, 
unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetimetable;
INFO  : Compiling 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
timestampCol from datetimetable
INFO  : No Stats for default@datetimetable, Columns: month, datetimestamp
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, 
type:string, comment:null), FieldSchema(name:datetimestamp, type:string, 
comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)], 
properties:null)
INFO  : Completed compiling 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time 
taken: 0.102 seconds
INFO  : Operation QUERY obtained 0 locks
INFO  : Executing 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
timestampCol from datetimetable
INFO  : Completed executing 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time 
taken: 0.0 seconds
+++---+
| month  | datetimestamp  | timestampcol  |
+++---+
| Feb    | 2001-02-28     | 983318400     |
| Feb    | 2001-02-29     | 983318400     |
| Feb    | 2001-02-30     | 983318400     |
| Feb    | 2001-02-31     | 983318400     |
| Feb    | 2001-02-32     | NULL          |
+++---+
5 rows selected (0.131 seconds){noformat}

> Hive UNIX_TIMESTAMP() not returning null for invalid dates
> --
>
> Key: HIVE-27772
> URL: https://issues.apache.org/jira/browse/HIVE-27772
> Project: Hive
>  Issue Type: Bug
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>
> {noformat}
> 6: jdbc:hive2://localhost:10001/> select month, datetimestamp, 
> unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from 
> datetimetable;
> INFO  : Compiling 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
> select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
> timestampCol from datetimetable
> INFO  : No Stats for default@datetimetable, Columns: month, datetimestamp
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, 
> type:string, comment:null), FieldSchema(name:datetimestamp, type:string, 
> comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)], 
> properties:null)
> INFO  : Completed compiling 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); 
> Time taken: 0.102 seconds
> INFO  : Operation QUERY obtained 0 locks
> INFO  : Executing 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
> select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
> timestampCol from datetimetable
> INFO  : Completed executing 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); 
> Time taken: 0.0 seconds
> +++---+
> | month  | datetimestamp  | timestampcol  |
> +++---+
> | Feb    | 2001-02-28     | 983318400     |
> | Feb    | 2001-02-29     | 983318400     |
> | Feb    | 2001-02-30     | 983318400     |
> | Feb    | 2001-02-31     | 983318400     |
> | Feb    | 2001-02-32     | NULL          |
> +++---+
> 5 rows selected (0.131 seconds){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27772) Hive UNIX_TIMESTAMP() not returning null for invalid dates

2023-10-05 Thread Simhadri Govindappa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated HIVE-27772:
---
Description: 
For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is 
giving out the timestamp value as the last valid date, rather than NULL. (e.g. 
UNIX_TIMESTAMP('2001-02-31', '-MM-dd') gives 983354400, which converts to 
'2001-02-28'. However, for calendar days larger than 31, e.g. 2001-02-32, or 
2023-04-32, UNIX_TIMESTAMP() would give NULL as a result.

In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or 0).

 

 

 
{noformat}
6: jdbc:hive2://localhost:10001/> select month, datetimestamp, 
unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetimetable;
INFO  : Compiling 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
timestampCol from datetimetable
INFO  : No Stats for default@datetimetable, Columns: month, datetimestamp
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, 
type:string, comment:null), FieldSchema(name:datetimestamp, type:string, 
comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)], 
properties:null)
INFO  : Completed compiling 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time 
taken: 0.102 seconds
INFO  : Operation QUERY obtained 0 locks
INFO  : Executing 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
timestampCol from datetimetable
INFO  : Completed executing 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time 
taken: 0.0 seconds
+++---+
| month  | datetimestamp  | timestampcol  |
+++---+
| Feb    | 2001-02-28     | 983318400     |
| Feb    | 2001-02-29     | 983318400     |
| Feb    | 2001-02-30     | 983318400     |
| Feb    | 2001-02-31     | 983318400     |
| Feb    | 2001-02-32     | NULL          |
+++---+
5 rows selected (0.131 seconds){noformat}

  was:
{noformat}
6: jdbc:hive2://localhost:10001/> select month, datetimestamp, 
unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetimetable;
INFO  : Compiling 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
timestampCol from datetimetable
INFO  : No Stats for default@datetimetable, Columns: month, datetimestamp
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, 
type:string, comment:null), FieldSchema(name:datetimestamp, type:string, 
comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)], 
properties:null)
INFO  : Completed compiling 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time 
taken: 0.102 seconds
INFO  : Operation QUERY obtained 0 locks
INFO  : Executing 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
timestampCol from datetimetable
INFO  : Completed executing 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time 
taken: 0.0 seconds
+++---+
| month  | datetimestamp  | timestampcol  |
+++---+
| Feb    | 2001-02-28     | 983318400     |
| Feb    | 2001-02-29     | 983318400     |
| Feb    | 2001-02-30     | 983318400     |
| Feb    | 2001-02-31     | 983318400     |
| Feb    | 2001-02-32     | NULL          |
+++---+
5 rows selected (0.131 seconds){noformat}


> Hive UNIX_TIMESTAMP() not returning null for invalid dates
> --
>
> Key: HIVE-27772
> URL: https://issues.apache.org/jira/browse/HIVE-27772
> Project: Hive
>  Issue Type: Bug
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>
> For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is 
> giving out the timestamp value as the last valid date, rather than NULL. 
> (e.g. UNIX_TIMESTAMP('2001-02-31', '-MM-dd') gives 983354400, which 
> converts to '2001-02-28'. However, for calendar days larger than 31, e.g. 
> 2001-02-32, or 2023-04-32, UNIX_TIMESTAMP() would give NULL as a result.
> In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or 
> 0).
>  
>  
>  
> {noformat}
> 6: jdbc:hive2://localhost:10001/> select month, datetimestamp,

[jira] [Updated] (HIVE-27772) Hive UNIX_TIMESTAMP() not returning null for invalid dates

2023-10-05 Thread Simhadri Govindappa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated HIVE-27772:
---
Description: 
For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is 
giving out the timestamp value as the last valid date, rather than NULL. (e.g. 
UNIX_TIMESTAMP('2001-02-31', '-MM-dd') gives 983354400, which converts to 
'2001-02-28'. However, for calendar days larger than 31, e.g. 2001-02-32, or 
2023-04-32, UNIX_TIMESTAMP() would give NULL as a result.

In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or 0).

 
{noformat}
6: jdbc:hive2://localhost:10001/> select month, datetimestamp, 
unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetimetable;
INFO  : Compiling 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
timestampCol from datetimetable
INFO  : No Stats for default@datetimetable, Columns: month, datetimestamp
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, 
type:string, comment:null), FieldSchema(name:datetimestamp, type:string, 
comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)], 
properties:null)
INFO  : Completed compiling 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time 
taken: 0.102 seconds
INFO  : Operation QUERY obtained 0 locks
INFO  : Executing 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
timestampCol from datetimetable
INFO  : Completed executing 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time 
taken: 0.0 seconds
+++---+
| month  | datetimestamp  | timestampcol  |
+++---+
| Feb    | 2001-02-28     | 983318400     |
| Feb    | 2001-02-29     | 983318400     |
| Feb    | 2001-02-30     | 983318400     |
| Feb    | 2001-02-31     | 983318400     |
| Feb    | 2001-02-32     | NULL          |
+++---+
5 rows selected (0.131 seconds){noformat}
 

 

It looks like 
[InstantDateTimeFormatter.java#L52|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/InstantDateTimeFormatter.java#L52]
  by default, the formatter has the SMART resolver style.

According to java jdk : 
https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/java/time/format/ResolverStyle.java#L103

 
{noformat}
  /**
     * Style to resolve dates and times strictly.
     * 
     * Using strict resolution will ensure that all parsed values are within
     * the outer range of valid values for the field. Individual fields may
     * be further processed for strictness.
     * 
     * For example, resolving year-month and day-of-month in the ISO calendar
     * system using strict mode will ensure that the day-of-month is valid
     * for the year-month, rejecting invalid values.
     */
    STRICT,
    /**
     * Style to resolve dates and times in a smart, or intelligent, manner.
     * 
     * Using smart resolution will perform the sensible default for each
     * field, which may be the same as strict, the same as lenient, or a third
     * behavior. Individual fields will interpret this differently.
     * 
     * For example, resolving year-month and day-of-month in the ISO calendar
     * system using smart mode will ensure that the day-of-month is from
     * 1 to 31, converting any value beyond the last valid day-of-month to be
     * the last valid day-of-month.
     */
    SMART,{noformat}
 

 

Therefore, we should set the resolverStyle to STRICT to reject invalid date 
values.

 

  was:
For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is 
giving out the timestamp value as the last valid date, rather than NULL. (e.g. 
UNIX_TIMESTAMP('2001-02-31', '-MM-dd') gives 983354400, which converts to 
'2001-02-28'. However, for calendar days larger than 31, e.g. 2001-02-32, or 
2023-04-32, UNIX_TIMESTAMP() would give NULL as a result.

In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or 0).

 

 

 
{noformat}
6: jdbc:hive2://localhost:10001/> select month, datetimestamp, 
unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetimetable;
INFO  : Compiling 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
timestampCol from datetimetable
INFO  : No Stats for default@datetimetable, Columns: month, datetimestamp
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, 
type:string, commen

[jira] [Resolved] (HIVE-27761) Compilation of nested CTEs throws SemanticException

2023-10-05 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-27761.
---
Resolution: Fixed

Merged to master. Thanks [~soumyakanti.das] for the patch.

> Compilation of nested CTEs throws SemanticException
> ---
>
> Key: HIVE-27761
> URL: https://issues.apache.org/jira/browse/HIVE-27761
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>
> Currently, nested CTEs are not supported in Hive. Simple repro:
> {code:java}
> with
> test1 as (
> with t1 as (select 1)
> select 1
> )
> select * from test1;
>  org.apache.hadoop.hive.ql.parse.SemanticException: Line 5:13 Ambiguous table 
> alias 't1'
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processCTE(SemanticAnalyzer.java:1310)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1980)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27765) Backport of HIVE-20052: Arrow serde should fill ArrowColumnVector(Decimal) with the given schema precision/scale

2023-10-05 Thread Sankar Hariappan (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-27765:

Summary: Backport of HIVE-20052: Arrow serde should fill 
ArrowColumnVector(Decimal) with the given schema precision/scale  (was: 
Backport of HIVE-20062: Arrow serde should fill ArrowColumnVector(Decimal) with 
the given schema precision/scale)

> Backport of HIVE-20052: Arrow serde should fill ArrowColumnVector(Decimal) 
> with the given schema precision/scale
> 
>
> Key: HIVE-27765
> URL: https://issues.apache.org/jira/browse/HIVE-27765
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 3.2.0
>Reporter: Aman Raj
>Assignee: Aman Raj
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27772) Hive UNIX_TIMESTAMP()should return null for invalid dates

2023-10-05 Thread Simhadri Govindappa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated HIVE-27772:
---
Summary: Hive UNIX_TIMESTAMP()should return null for invalid dates  (was: 
Hive UNIX_TIMESTAMP() not returning null for invalid dates)

> Hive UNIX_TIMESTAMP()should return null for invalid dates
> -
>
> Key: HIVE-27772
> URL: https://issues.apache.org/jira/browse/HIVE-27772
> Project: Hive
>  Issue Type: Bug
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>
> For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is 
> giving out the timestamp value as the last valid date, rather than NULL. 
> (e.g. UNIX_TIMESTAMP('2001-02-31', '-MM-dd') gives 983354400, which 
> converts to '2001-02-28'. However, for calendar days larger than 31, e.g. 
> 2001-02-32, or 2023-04-32, UNIX_TIMESTAMP() would give NULL as a result.
> In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or 
> 0).
>  
> {noformat}
> 6: jdbc:hive2://localhost:10001/> select month, datetimestamp, 
> unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from 
> datetimetable;
> INFO  : Compiling 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
> select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
> timestampCol from datetimetable
> INFO  : No Stats for default@datetimetable, Columns: month, datetimestamp
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, 
> type:string, comment:null), FieldSchema(name:datetimestamp, type:string, 
> comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)], 
> properties:null)
> INFO  : Completed compiling 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); 
> Time taken: 0.102 seconds
> INFO  : Operation QUERY obtained 0 locks
> INFO  : Executing 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
> select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
> timestampCol from datetimetable
> INFO  : Completed executing 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); 
> Time taken: 0.0 seconds
> +++---+
> | month  | datetimestamp  | timestampcol  |
> +++---+
> | Feb    | 2001-02-28     | 983318400     |
> | Feb    | 2001-02-29     | 983318400     |
> | Feb    | 2001-02-30     | 983318400     |
> | Feb    | 2001-02-31     | 983318400     |
> | Feb    | 2001-02-32     | NULL          |
> +++---+
> 5 rows selected (0.131 seconds){noformat}
>  
>  
> It looks like 
> [InstantDateTimeFormatter.java#L52|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/InstantDateTimeFormatter.java#L52]
>   by default, the formatter has the SMART resolver style.
> According to java jdk : 
> https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/java/time/format/ResolverStyle.java#L103
>  
> {noformat}
>   /**
>      * Style to resolve dates and times strictly.
>      * 
>      * Using strict resolution will ensure that all parsed values are within
>      * the outer range of valid values for the field. Individual fields may
>      * be further processed for strictness.
>      * 
>      * For example, resolving year-month and day-of-month in the ISO calendar
>      * system using strict mode will ensure that the day-of-month is valid
>      * for the year-month, rejecting invalid values.
>      */
>     STRICT,
>     /**
>      * Style to resolve dates and times in a smart, or intelligent, manner.
>      * 
>      * Using smart resolution will perform the sensible default for each
>      * field, which may be the same as strict, the same as lenient, or a third
>      * behavior. Individual fields will interpret this differently.
>      * 
>      * For example, resolving year-month and day-of-month in the ISO calendar
>      * system using smart mode will ensure that the day-of-month is from
>      * 1 to 31, converting any value beyond the last valid day-of-month to be
>      * the last valid day-of-month.
>      */
>     SMART,{noformat}
>  
>  
> Therefore, we should set the resolverStyle to STRICT to reject invalid date 
> values.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27772) Hive UNIX_TIMESTAMP()should return null for invalid dates

2023-10-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27772:
--
Labels: pull-request-available  (was: )

> Hive UNIX_TIMESTAMP()should return null for invalid dates
> -
>
> Key: HIVE-27772
> URL: https://issues.apache.org/jira/browse/HIVE-27772
> Project: Hive
>  Issue Type: Bug
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>
> For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is 
> giving out the timestamp value as the last valid date, rather than NULL. 
> (e.g. UNIX_TIMESTAMP('2001-02-31', '-MM-dd') gives 983354400, which 
> converts to '2001-02-28'. However, for calendar days larger than 31, e.g. 
> 2001-02-32, or 2023-04-32, UNIX_TIMESTAMP() would give NULL as a result.
> In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or 
> 0).
>  
> {noformat}
> 6: jdbc:hive2://localhost:10001/> select month, datetimestamp, 
> unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from 
> datetimetable;
> INFO  : Compiling 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
> select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
> timestampCol from datetimetable
> INFO  : No Stats for default@datetimetable, Columns: month, datetimestamp
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, 
> type:string, comment:null), FieldSchema(name:datetimestamp, type:string, 
> comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)], 
> properties:null)
> INFO  : Completed compiling 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); 
> Time taken: 0.102 seconds
> INFO  : Operation QUERY obtained 0 locks
> INFO  : Executing 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
> select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
> timestampCol from datetimetable
> INFO  : Completed executing 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); 
> Time taken: 0.0 seconds
> +++---+
> | month  | datetimestamp  | timestampcol  |
> +++---+
> | Feb    | 2001-02-28     | 983318400     |
> | Feb    | 2001-02-29     | 983318400     |
> | Feb    | 2001-02-30     | 983318400     |
> | Feb    | 2001-02-31     | 983318400     |
> | Feb    | 2001-02-32     | NULL          |
> +++---+
> 5 rows selected (0.131 seconds){noformat}
>  
>  
> It looks like 
> [InstantDateTimeFormatter.java#L52|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/InstantDateTimeFormatter.java#L52]
>   by default, the formatter has the SMART resolver style.
> According to java jdk : 
> https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/java/time/format/ResolverStyle.java#L103
>  
> {noformat}
>   /**
>      * Style to resolve dates and times strictly.
>      * 
>      * Using strict resolution will ensure that all parsed values are within
>      * the outer range of valid values for the field. Individual fields may
>      * be further processed for strictness.
>      * 
>      * For example, resolving year-month and day-of-month in the ISO calendar
>      * system using strict mode will ensure that the day-of-month is valid
>      * for the year-month, rejecting invalid values.
>      */
>     STRICT,
>     /**
>      * Style to resolve dates and times in a smart, or intelligent, manner.
>      * 
>      * Using smart resolution will perform the sensible default for each
>      * field, which may be the same as strict, the same as lenient, or a third
>      * behavior. Individual fields will interpret this differently.
>      * 
>      * For example, resolving year-month and day-of-month in the ISO calendar
>      * system using smart mode will ensure that the day-of-month is from
>      * 1 to 31, converting any value beyond the last valid day-of-month to be
>      * the last valid day-of-month.
>      */
>     SMART,{noformat}
>  
>  
> Therefore, we should set the resolverStyle to STRICT to reject invalid date 
> values.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work started] (HIVE-27773) get_valid_write_ids is being called multiple times for a single query

2023-10-05 Thread Ramesh Kumar Thangarajan (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-27773 started by Ramesh Kumar Thangarajan.
---
> get_valid_write_ids is being called multiple times for a single query
> -
>
> Key: HIVE-27773
> URL: https://issues.apache.org/jira/browse/HIVE-27773
> Project: Hive
>  Issue Type: Task
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>
> Looking at the below logs suggest that the get_valid_write_ids is not cached 
> for a single query for a single table. It is being called multiple times 
> across different phases in the compilation of the query. We should verify if 
> we can safely cache and re use the results. That way we can avoid around 
> 40-50 ms out of 678ms compilation time.
>  
> {code:java}
> 2023-09-19T02:55:06,940 INFO [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener 
> at 0.0.0.0/50501] ql.Driver: Compiling 
> command(queryId=rameshkumar_20230919025506_b005cc57-1717-4798-b8da-b502aa7ca3d6):
> 2023-09-19T02:55:06,967 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener 
> at 0.0.0.0/50501] metrics.PerfLogger:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2023-09-19T02:55:06,979 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener 
> at 0.0.0.0/50501] metrics.PerfLogger:  start=1695117306967 end=1695117306979 duration=12 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler retryCount=0 
> error=false>
> 2023-09-19T02:55:06,980 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener 
> at 0.0.0.0/50501] metrics.PerfLogger:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2023-09-19T02:55:06,986 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener 
> at 0.0.0.0/50501] metrics.PerfLogger:  start=1695117306980 end=1695117306986 duration=6 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler retryCount=0 
> error=false>
> 2023-09-19T02:55:06,988 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener 
> at 0.0.0.0/50501] metrics.PerfLogger:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2023-09-19T02:55:06,995 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener 
> at 0.0.0.0/50501] metrics.PerfLogger:  start=1695117306988 end=1695117306995 duration=7 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler retryCount=0 
> error=false>
> 2023-09-19T02:55:06,997 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener 
> at 0.0.0.0/50501] metrics.PerfLogger:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2023-09-19T02:55:07,007 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener 
> at 0.0.0.0/50501] metrics.PerfLogger:  start=1695117306997 end=1695117307007 duration=10 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler retryCount=0 
> error=false>
> 2023-09-19T02:55:07,009 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener 
> at 0.0.0.0/50501] metrics.PerfLogger:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2023-09-19T02:55:07,017 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener 
> at 0.0.0.0/50501] metrics.PerfLogger:  start=1695117307009 end=1695117307017 duration=8 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler retryCount=0 
> error=false>
> 2023-09-19T02:55:07,018 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener 
> at 0.0.0.0/50501] metrics.PerfLogger:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2023-09-19T02:55:07,026 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener 
> at 0.0.0.0/50501] metrics.PerfLogger:  start=1695117307018 end=1695117307026 duration=8 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler retryCount=0 
> error=false>
> 2023-09-19T02:55:07,059 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener 
> at 0.0.0.0/50501] metrics.PerfLogger:  from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> 2023-09-19T02:55:07,068 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener 
> at 0.0.0.0/50501] metrics.PerfLogger:  start=1695117307059 end=1695117307068 duration=9 
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler retryCount=0 
> error=false>
> 2023-09-19T02:55:07,618 INFO [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener 
> at 0.0.0.0/50501] ql.Driver: Completed compiling 
> command(queryId=rameshkumar_20230919025506_b005cc57-1717-4798-b8da-b502aa7ca3d6);
>  Time taken: 0.678 seconds{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27773) get_valid_write_ids is being called multiple times for a single query

2023-10-05 Thread Ramesh Kumar Thangarajan (Jira)

Ramesh Kumar Thangarajan created HIVE-27773:
---

 Summary: get_valid_write_ids is being called multiple times for a 
single query
 Key: HIVE-27773
 URL: https://issues.apache.org/jira/browse/HIVE-27773
 Project: Hive
  Issue Type: Task
Reporter: Ramesh Kumar Thangarajan
Assignee: Ramesh Kumar Thangarajan


Looking at the below logs suggest that the get_valid_write_ids is not cached 
for a single query for a single table. It is being called multiple times across 
different phases in the compilation of the query. We should verify if we can 
safely cache and re use the results. That way we can avoid at the 40-50 ms out 
of 678ms compilation time.

 
{code:java}
2023-09-19T02:55:06,940 INFO [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] ql.Driver: Compiling 
command(queryId=rameshkumar_20230919025506_b005cc57-1717-4798-b8da-b502aa7ca3d6):
2023-09-19T02:55:06,967 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:06,979 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:06,980 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:06,986 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:06,988 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:06,995 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:06,997 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:07,007 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:07,009 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:07,017 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:07,018 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:07,026 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:07,059 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:07,068 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:07,618 INFO [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] ql.Driver: Completed compiling 
command(queryId=rameshkumar_20230919025506_b005cc57-1717-4798-b8da-b502aa7ca3d6);
 Time taken: 0.678 seconds{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27773) get_valid_write_ids is being called multiple times for a single query

2023-10-05 Thread Ramesh Kumar Thangarajan (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramesh Kumar Thangarajan updated HIVE-27773:

Description: 
Looking at the below logs suggest that the get_valid_write_ids is not cached 
for a single query for a single table. It is being called multiple times across 
different phases in the compilation of the query. We should verify if we can 
safely cache and re use the results. That way we can avoid around 40-50 ms out 
of 678ms compilation time.

 
{code:java}
2023-09-19T02:55:06,940 INFO [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] ql.Driver: Compiling 
command(queryId=rameshkumar_20230919025506_b005cc57-1717-4798-b8da-b502aa7ca3d6):
2023-09-19T02:55:06,967 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:06,979 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:06,980 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:06,986 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:06,988 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:06,995 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:06,997 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:07,007 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:07,009 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:07,017 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:07,018 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:07,026 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:07,059 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:07,068 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:07,618 INFO [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] ql.Driver: Completed compiling 
command(queryId=rameshkumar_20230919025506_b005cc57-1717-4798-b8da-b502aa7ca3d6);
 Time taken: 0.678 seconds{code}

  was:
Looking at the below logs suggest that the get_valid_write_ids is not cached 
for a single query for a single table. It is being called multiple times across 
different phases in the compilation of the query. We should verify if we can 
safely cache and re use the results. That way we can avoid at the 40-50 ms out 
of 678ms compilation time.

 
{code:java}
2023-09-19T02:55:06,940 INFO [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] ql.Driver: Compiling 
command(queryId=rameshkumar_20230919025506_b005cc57-1717-4798-b8da-b502aa7ca3d6):
2023-09-19T02:55:06,967 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:06,979 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:06,980 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:06,986 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:06,988 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:06,995 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:06,997 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:07,007 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:07,009 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:07,017 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:07,018 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:07,026 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:07,059 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:07,068 DEBUG [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] metrics.PerfLogger: 
2023-09-19T02:55:07,618 INFO [fa0fa087-7e2c-45b8-bd27-b94fbbe23e49 Listener at 
0.0.0.0/50501] ql.Driver: Completed comp

[jira] [Updated] (HIVE-27531) Unparse identifiers in show create table output

2023-10-05 Thread Soumyakanti Das (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Soumyakanti Das updated HIVE-27531:
---
Summary: Unparse identifiers in show create table output  (was: Incorrect 
constraint name format in the output of show create table)

> Unparse identifiers in show create table output
> ---
>
> Key: HIVE-27531
> URL: https://issues.apache.org/jira/browse/HIVE-27531
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>
> Currently, {{SHOW CREATE TABLE}} on tables with constraints return {{ALTER 
> TABLE}} statements with incompatible constraint names. Running the ALTER 
> TABLE statements throw ParseException.
> For example:
> {code:java}
> 0: jdbc:hive2://localhost:11050/default> ALTER TABLE reason ADD CONSTRAINT 
> 2e47abb2-b6c7-450a-8229-395d6b1ff168 PRIMARY KEY (r_reason_sk) DISABLE 
> NOVALIDATE RELY;
> Error: Error while compiling statement: FAILED: ParseException line 1:43 
> cannot recognize input near 'CONSTRAINT' '2e47abb2' '-' in add constraint 
> statement (state=42000,code=4) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27531) Unparse identifiers in show create table output

2023-10-05 Thread Soumyakanti Das (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Soumyakanti Das updated HIVE-27531:
---
Description: 
Currently, {{SHOW CREATE TABLE}} on tables with constraints return {{ALTER 
TABLE}} statements with incompatible constraint names. Running the ALTER TABLE 
statements throw ParseException.

For example:
{code:java}
0: jdbc:hive2://localhost:11050/default> ALTER TABLE reason ADD CONSTRAINT 
2e47abb2-b6c7-450a-8229-395d6b1ff168 PRIMARY KEY (r_reason_sk) DISABLE 
NOVALIDATE RELY;
Error: Error while compiling statement: FAILED: ParseException line 1:43 cannot 
recognize input near 'CONSTRAINT' '2e47abb2' '-' in add constraint statement 
(state=42000,code=4) {code}
Ideally all identifiers should be unparsed in the output of show create table.

  was:
Currently, {{SHOW CREATE TABLE}} on tables with constraints return {{ALTER 
TABLE}} statements with incompatible constraint names. Running the ALTER TABLE 
statements throw ParseException.

For example:
{code:java}
0: jdbc:hive2://localhost:11050/default> ALTER TABLE reason ADD CONSTRAINT 
2e47abb2-b6c7-450a-8229-395d6b1ff168 PRIMARY KEY (r_reason_sk) DISABLE 
NOVALIDATE RELY;
Error: Error while compiling statement: FAILED: ParseException line 1:43 cannot 
recognize input near 'CONSTRAINT' '2e47abb2' '-' in add constraint statement 
(state=42000,code=4) {code}
 


> Unparse identifiers in show create table output
> ---
>
> Key: HIVE-27531
> URL: https://issues.apache.org/jira/browse/HIVE-27531
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>
> Currently, {{SHOW CREATE TABLE}} on tables with constraints return {{ALTER 
> TABLE}} statements with incompatible constraint names. Running the ALTER 
> TABLE statements throw ParseException.
> For example:
> {code:java}
> 0: jdbc:hive2://localhost:11050/default> ALTER TABLE reason ADD CONSTRAINT 
> 2e47abb2-b6c7-450a-8229-395d6b1ff168 PRIMARY KEY (r_reason_sk) DISABLE 
> NOVALIDATE RELY;
> Error: Error while compiling statement: FAILED: ParseException line 1:43 
> cannot recognize input near 'CONSTRAINT' '2e47abb2' '-' in add constraint 
> statement (state=42000,code=4) {code}
> Ideally all identifiers should be unparsed in the output of show create table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27531) Unparse identifiers in show create table output

2023-10-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27531:
--
Labels: pull-request-available  (was: )

> Unparse identifiers in show create table output
> ---
>
> Key: HIVE-27531
> URL: https://issues.apache.org/jira/browse/HIVE-27531
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>
> Currently, {{SHOW CREATE TABLE}} on tables with constraints return {{ALTER 
> TABLE}} statements with incompatible constraint names. Running the ALTER 
> TABLE statements throw ParseException.
> For example:
> {code:java}
> 0: jdbc:hive2://localhost:11050/default> ALTER TABLE reason ADD CONSTRAINT 
> 2e47abb2-b6c7-450a-8229-395d6b1ff168 PRIMARY KEY (r_reason_sk) DISABLE 
> NOVALIDATE RELY;
> Error: Error while compiling statement: FAILED: ParseException line 1:43 
> cannot recognize input near 'CONSTRAINT' '2e47abb2' '-' in add constraint 
> statement (state=42000,code=4) {code}
> Ideally all identifiers should be unparsed in the output of show create table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27119) Iceberg: Delete from table generates lot of files

2023-10-05 Thread okumin (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17772444#comment-17772444
 ] 

okumin commented on HIVE-27119:
---

[~rajesh.balamohan] I think Auto Reduce Parallelism would help you.


{code:java}
zookage@client-node-0:~$ beeline -e 'delete from store_Sales where 
ss_customer_sk % 10 = 0' --hiveconf hive.tez.auto.reducer.parallelism=true 
--hiveconf hive.server2.in.place.progress=false
...
INFO  : 2023-10-06 05:58:56,843 Map 1: -/-  Reducer 2: 0/8  
INFO  : 2023-10-06 05:58:58,367 Map 1: 0/1  Reducer 2: 0/8  
INFO  : 2023-10-06 05:59:01,396 Map 1: 0(+1)/1  Reducer 2: 0/8  
INFO  : 2023-10-06 05:59:12,491 Map 1: 1/1  Reducer 2: 0/8  
INFO  : 2023-10-06 05:59:13,005 Map 1: 1/1  Reducer 2: 2/2  
{code}


> Iceberg: Delete from table generates lot of files
> -
>
> Key: HIVE-27119
> URL: https://issues.apache.org/jira/browse/HIVE-27119
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: performance
>
> With "delete" it generates lot of files due to the way data is sent to the 
> reducers. Files per partition is impacted by the number of reduce tasks.
> One way could be to explicitly control the number of reducers; Creating this 
> ticket to have a long term fix.
>  
> {noformat}
>  explain delete from store_Sales where ss_customer_sk % 10 = 0;
> INFO  : Compiling 
> command(queryId=hive_20230303021031_855dd644-8f67-482d-98d7-e9f70b56ae0b): 
> explain delete from store_Sales where ss_customer_sk % 10 = 0
> INFO  : No Stats for tpcds_1000_iceberg_mor_v4@store_sales, Columns: 
> ss_sold_time_sk, ss_cdemo_sk, ss_promo_sk, ss_ext_discount_amt, 
> ss_ext_sales_price, ss_net_profit, ss_addr_sk, ss_ticket_number, 
> ss_wholesale_cost, ss_item_sk, ss_ext_list_price, ss_sold_date_sk, 
> ss_store_sk, ss_coupon_amt, ss_quantity, ss_list_price, ss_sales_price, 
> ss_customer_sk, ss_ext_wholesale_cost, ss_net_paid, ss_ext_tax, ss_hdemo_sk, 
> ss_net_paid_inc_tax
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain, 
> type:string, comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20230303021031_855dd644-8f67-482d-98d7-e9f70b56ae0b); 
> Time taken: 0.704 seconds
> INFO  : Executing 
> command(queryId=hive_20230303021031_855dd644-8f67-482d-98d7-e9f70b56ae0b): 
> explain delete from store_Sales where ss_customer_sk % 10 = 0
> INFO  : Starting task [Stage-4:EXPLAIN] in serial mode
> INFO  : Completed executing 
> command(queryId=hive_20230303021031_855dd644-8f67-482d-98d7-e9f70b56ae0b); 
> Time taken: 0.005 seconds
> INFO  : OK
> Explain
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-2 depends on stages: Stage-1
>   Stage-0 depends on stages: Stage-2
>   Stage-3 depends on stages: Stage-0
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: hive_20230303021031_855dd644-8f67-482d-98d7-e9f70b56ae0b:377
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
>   DagName: hive_20230303021031_855dd644-8f67-482d-98d7-e9f70b56ae0b:377
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: store_sales
>   filterExpr: ((ss_customer_sk % 10) = 0) (type: boolean)
>   Statistics: Num rows: 2755519629 Data size: 3643899155232 
> Basic stats: COMPLETE Column stats: NONE
>   Filter Operator
> predicate: ((ss_customer_sk % 10) = 0) (type: boolean)
> Statistics: Num rows: 1377759814 Data size: 1821949576954 
> Basic stats: COMPLETE Column stats: NONE
> Select Operator
>   expressions: PARTITION__SPEC__ID (type: int), 
> PARTITION__HASH (type: bigint), FILE__PATH (type: string), ROW__POSITION 
> (type: bigint), ss_sold_time_sk (type: int), ss_item_sk (type: int), 
> ss_customer_sk (type: int), ss_cdemo_sk (type: int), ss_hdemo_sk (type: int), 
> ss_addr_sk (type: int), ss_store_sk (type: int), ss_promo_sk (type: int), 
> ss_ticket_number (type: bigint), ss_quantity (type: int), ss_wholesale_cost 
> (type: decimal(7,2)), ss_list_price (type: decimal(7,2)), ss_sales_price 
> (type: decimal(7,2)), ss_ext_discount_amt (type: decimal(7,2)), 
> ss_ext_sales_price (type: decimal(7,2)), ss_ext_wholesale_cost (type: 
> decimal(7,2)), ss_ext_list_price (type: decimal(7,2)), ss_ext_tax (type: 
> decimal(7,2)), ss_coupon_amt (type: decimal(7,2)), ss_net_paid (type: 
> decimal(7,2)), ss_net_paid_inc_tax (type: decimal(7,2)), ss_net_profit (type: 
> decimal(7,2)), ss_sold_date_sk (type: int)
>   outputColumnNames: _col0, _col1, _col2, _col3, _col4,

[jira] [Created] (HIVE-27771) Iceberg: Allow expire snapshot by time range

[jira] [Updated] (HIVE-27771) Iceberg: Allow expire snapshot by time range

[jira] [Resolved] (HIVE-26455) Remove PowerMockito from hive-exec

[jira] [Created] (HIVE-27772) Hive UNIX_TIMESTAMP() not returning null for invalid dates

[jira] [Updated] (HIVE-27772) Hive UNIX_TIMESTAMP() not returning null for invalid dates

[jira] [Updated] (HIVE-27772) Hive UNIX_TIMESTAMP() not returning null for invalid dates

[jira] [Updated] (HIVE-27772) Hive UNIX_TIMESTAMP() not returning null for invalid dates

[jira] [Resolved] (HIVE-27761) Compilation of nested CTEs throws SemanticException

[jira] [Updated] (HIVE-27765) Backport of HIVE-20052: Arrow serde should fill ArrowColumnVector(Decimal) with the given schema precision/scale

[jira] [Updated] (HIVE-27772) Hive UNIX_TIMESTAMP()should return null for invalid dates

[jira] [Updated] (HIVE-27772) Hive UNIX_TIMESTAMP()should return null for invalid dates

[jira] [Work started] (HIVE-27773) get_valid_write_ids is being called multiple times for a single query

[jira] [Created] (HIVE-27773) get_valid_write_ids is being called multiple times for a single query

[jira] [Updated] (HIVE-27773) get_valid_write_ids is being called multiple times for a single query

[jira] [Updated] (HIVE-27531) Unparse identifiers in show create table output

[jira] [Updated] (HIVE-27531) Unparse identifiers in show create table output

[jira] [Updated] (HIVE-27531) Unparse identifiers in show create table output

[jira] [Commented] (HIVE-27119) Iceberg: Delete from table generates lot of files

18 matches

Site Navigation

Mail list logo

Footer information