[jira] [Updated] (HIVE-27802) Simplify TestTezSessionState.testSymlinkedLocalFilesAreLocalizedOnce

2023-10-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-27802:

Fix Version/s: 4.0.0

> Simplify TestTezSessionState.testSymlinkedLocalFilesAreLocalizedOnce
> 
>
> Key: HIVE-27802
> URL: https://issues.apache.org/jira/browse/HIVE-27802
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The unit test added in HIVE-27723 is overcomplicated: there is no need to 
> mock a session pool manager to get a TezSessionState.
> We need to simply instantiate TezSessionState directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27802) Simplify TestTezSessionState.testSymlinkedLocalFilesAreLocalizedOnce

2023-10-16 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-27802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17776024#comment-17776024
 ] 

László Bodor commented on HIVE-27802:
-

merged to master, thanks [~ayushtkn] for the review!

> Simplify TestTezSessionState.testSymlinkedLocalFilesAreLocalizedOnce
> 
>
> Key: HIVE-27802
> URL: https://issues.apache.org/jira/browse/HIVE-27802
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The unit test added in HIVE-27723 is overcomplicated: there is no need to 
> mock a session pool manager to get a TezSessionState.
> We need to simply instantiate TezSessionState directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27802) Simplify TestTezSessionState.testSymlinkedLocalFilesAreLocalizedOnce

2023-10-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor resolved HIVE-27802.
-
Resolution: Fixed

> Simplify TestTezSessionState.testSymlinkedLocalFilesAreLocalizedOnce
> 
>
> Key: HIVE-27802
> URL: https://issues.apache.org/jira/browse/HIVE-27802
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The unit test added in HIVE-27723 is overcomplicated: there is no need to 
> mock a session pool manager to get a TezSessionState.
> We need to simply instantiate TezSessionState directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27682) AlterTableAlterPartitionOperation cannot change the type if the column has default partition

2023-10-16 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-27682:
---
Summary: AlterTableAlterPartitionOperation cannot change the type if the 
column has default partition  (was: AlterTableAlterPartitionOperation cannot 
change the column type if the table has default partition)

> AlterTableAlterPartitionOperation cannot change the type if the column has 
> default partition
> 
>
> Key: HIVE-27682
> URL: https://issues.apache.org/jira/browse/HIVE-27682
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Steps to repro the case :
> {noformat}
> create database pt;
> create table pt.alterdynamic_part_table(intcol string) partitioned by 
> (partcol1 string, partcol2 string);
> insert into table pt.alterdynamic_part_table partition(partcol1, partcol2) 
> select NULL, '2', NULL;
> alter table pt.alterdynamic_part_table partition column (partcol2 
> int);{noformat}
> Exception is thrown:
> {noformat}
> org.apache.hadoop.hive.ql.metadata.HiveException: Exception while checking 
> type conversion of existing partition values to FieldSchema(name:partcol2, 
> type:int, comment:null) : Exception while converting string to int for value 
> : NULL
>     at 
> org.apache.hadoop.hive.ql.ddl.table.partition.alter.AlterTableAlterPartitionOperation.check(AlterTableAlterPartitionOperation.java:69)
>     at 
> org.apache.hadoop.hive.ql.ddl.table.partition.alter.AlterTableAlterPartitionOperation.execute(AlterTableAlterPartitionOperation.java:55){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27682) AlterTableAlterPartitionOperation cannot change the column type if the table has default partition

2023-10-16 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-27682.

Fix Version/s: 4.0.0
   Resolution: Fixed

Fix has been merged to master. Thank you [~hemanth619] for the review!

> AlterTableAlterPartitionOperation cannot change the column type if the table 
> has default partition
> --
>
> Key: HIVE-27682
> URL: https://issues.apache.org/jira/browse/HIVE-27682
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Steps to repro the case :
> {noformat}
> create database pt;
> create table pt.alterdynamic_part_table(intcol string) partitioned by 
> (partcol1 string, partcol2 string);
> insert into table pt.alterdynamic_part_table partition(partcol1, partcol2) 
> select NULL, '2', NULL;
> alter table pt.alterdynamic_part_table partition column (partcol2 
> int);{noformat}
> Exception is thrown:
> {noformat}
> org.apache.hadoop.hive.ql.metadata.HiveException: Exception while checking 
> type conversion of existing partition values to FieldSchema(name:partcol2, 
> type:int, comment:null) : Exception while converting string to int for value 
> : NULL
>     at 
> org.apache.hadoop.hive.ql.ddl.table.partition.alter.AlterTableAlterPartitionOperation.check(AlterTableAlterPartitionOperation.java:69)
>     at 
> org.apache.hadoop.hive.ql.ddl.table.partition.alter.AlterTableAlterPartitionOperation.execute(AlterTableAlterPartitionOperation.java:55){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27169) New Locked List to prevent configuration change at runtime without throwing error

2023-10-16 Thread Pravin Sinha (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775816#comment-17775816
 ] 

Pravin Sinha commented on HIVE-27169:
-

Merged to the master. Thanks for the patch [~Aggarwal_Raghav]  and review 
[~okumin] !!

> New Locked List to prevent configuration change at runtime without throwing 
> error
> -
>
> Key: HIVE-27169
> URL: https://issues.apache.org/jira/browse/HIVE-27169
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0-alpha-2
>Reporter: Raghav Aggarwal
>Assignee: Raghav Aggarwal
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> _*AIM*_
> Create a new locked list called _*hive.conf.locked.list*_ which contains 
> comma separated configuration that can't be changed during runtime. If 
> someone try to change them at runtime then it will give WARN log on beeline 
> itself and will not change that config.
>  
> _*How is it different from Restricted List?*_
> When running hql file or at runtime, if a configuration present in restricted 
> list get updated then it will throw error and won't proceed with further 
> execution of hql file.
> With locked list, the configuration that is getting updated will throw 
> _*WARN*_ log on beeline and will continue to execute the hql file.
>  
> _*Why is it required?*_
> In organisations, admin want to enforce some configs which user shouldn't be 
> able to change at runtime and it shouldn't affect user's existing hql 
> scripts. Therefore, this locked list will be useful as it will not allow user 
> to change the value of particular configs and it will also not stop the 
> execution of hql scripts.
>  
> {_}*NOTE*{_}: Only at cluster level _*hive.conf.locked.list*_ can be set and 
> after that the hive service needs to be restarted.
> This will be very helpful when organisations are migrating from Hive 1.x, 
> Hive2.x to higher version and admin want to enforce some configuration which 
> should remain constant.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27169) New Locked List to prevent configuration change at runtime without throwing error

2023-10-16 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha resolved HIVE-27169.
-
Resolution: Fixed

> New Locked List to prevent configuration change at runtime without throwing 
> error
> -
>
> Key: HIVE-27169
> URL: https://issues.apache.org/jira/browse/HIVE-27169
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0-alpha-2
>Reporter: Raghav Aggarwal
>Assignee: Raghav Aggarwal
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> _*AIM*_
> Create a new locked list called _*hive.conf.locked.list*_ which contains 
> comma separated configuration that can't be changed during runtime. If 
> someone try to change them at runtime then it will give WARN log on beeline 
> itself and will not change that config.
>  
> _*How is it different from Restricted List?*_
> When running hql file or at runtime, if a configuration present in restricted 
> list get updated then it will throw error and won't proceed with further 
> execution of hql file.
> With locked list, the configuration that is getting updated will throw 
> _*WARN*_ log on beeline and will continue to execute the hql file.
>  
> _*Why is it required?*_
> In organisations, admin want to enforce some configs which user shouldn't be 
> able to change at runtime and it shouldn't affect user's existing hql 
> scripts. Therefore, this locked list will be useful as it will not allow user 
> to change the value of particular configs and it will also not stop the 
> execution of hql scripts.
>  
> {_}*NOTE*{_}: Only at cluster level _*hive.conf.locked.list*_ can be set and 
> after that the hive service needs to be restarted.
> This will be very helpful when organisations are migrating from Hive 1.x, 
> Hive2.x to higher version and admin want to enforce some configuration which 
> should remain constant.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27772) UNIX_TIMESTAMP should return NULL when date fields are out of bounds

2023-10-16 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated HIVE-27772:
---
Description: 
For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is 
giving out the timestamp value as the last valid date, rather than NULL. (e.g. 
UNIX_TIMESTAMP('2001-02-31', '-MM-dd') gives 983354400, which converts to 
'2001-02-28'. However, for calendar days larger than 31, e.g. 2001-02-32, or 
2023-04-32, UNIX_TIMESTAMP() would give NULL as a result.

In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or 0).

 
{noformat}
6: jdbc:hive2://localhost:10001/> select month, datetimestamp, 
unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetimetable;
INFO  : Compiling 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
timestampCol from datetimetable
INFO  : No Stats for default@datetimetable, Columns: month, datetimestamp
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, 
type:string, comment:null), FieldSchema(name:datetimestamp, type:string, 
comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)], 
properties:null)
INFO  : Completed compiling 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time 
taken: 0.102 seconds
INFO  : Operation QUERY obtained 0 locks
INFO  : Executing 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
timestampCol from datetimetable
INFO  : Completed executing 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time 
taken: 0.0 seconds
+++---+
| month  | datetimestamp  | timestampcol  |
+++---+
| Feb    | 2001-02-28     | 983318400     |
| Feb    | 2001-02-29     | 983318400     |
| Feb    | 2001-02-30     | 983318400     |
| Feb    | 2001-02-31     | 983318400     |
| Feb    | 2001-02-32     | NULL          |
+++---+
5 rows selected (0.131 seconds){noformat}
 

 

 

According to java jdk : 
[https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/java/time/format/ResolverStyle.java#L103]

 
{noformat}
  /**
     * Style to resolve dates and times strictly.
     * 
     * Using strict resolution will ensure that all parsed values are within
     * the outer range of valid values for the field. Individual fields may
     * be further processed for strictness.
     * 
     * For example, resolving year-month and day-of-month in the ISO calendar
     * system using strict mode will ensure that the day-of-month is valid
     * for the year-month, rejecting invalid values.
     */
    STRICT,
    /**
     * Style to resolve dates and times in a smart, or intelligent, manner.
     * 
     * Using smart resolution will perform the sensible default for each
     * field, which may be the same as strict, the same as lenient, or a third
     * behavior. Individual fields will interpret this differently.
     * 
     * For example, resolving year-month and day-of-month in the ISO calendar
     * system using smart mode will ensure that the day-of-month is from
     * 1 to 31, converting any value beyond the last valid day-of-month to be
     * the last valid day-of-month.
     */
    SMART,{noformat}
 

 

By default, the DATETIME formatter uses the SMART resolution style and the 
SIMPLE formatter the LENIENT. Both of these styles are able to resolve 
"invalid" bounds to valid dates. In order to prevent seemingly "invalid" dates 
to be parsed correctly we have to use the STRICT resolution style. However, we 
cannot simply switch the formatters to always use the STRICT resolution cause 
that would break existing applications relying on the existing resolution 
rules. To address the problem reported here and retain the previous behaviour 
we opted to make the resolution style configurable by adding a new property. 
The new property only affects the DATETIME formatter; the SIMPLE formatter is 
almost deprecated so we don't add new features to it.





 

  was:
For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is 
giving out the timestamp value as the last valid date, rather than NULL. (e.g. 
UNIX_TIMESTAMP('2001-02-31', '-MM-dd') gives 983354400, which converts to 
'2001-02-28'. However, for calendar days larger than 31, e.g. 2001-02-32, or 
2023-04-32, UNIX_TIMESTAMP() would give NULL as a result.

In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or 0).

 
{noformat}
6: jdbc:hive2://localhost:10001/> select month, datetimestamp, 
unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from 

[jira] [Commented] (HIVE-27772) UNIX_TIMESTAMP should return NULL when date fields are out of bounds

2023-10-16 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775767#comment-17775767
 ] 

Simhadri Govindappa commented on HIVE-27772:


Thanks [~zabetak] for the review. 

I will update the wiki and the Jira description.

> UNIX_TIMESTAMP should return NULL when date fields are out of bounds
> 
>
> Key: HIVE-27772
> URL: https://issues.apache.org/jira/browse/HIVE-27772
> Project: Hive
>  Issue Type: Bug
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is 
> giving out the timestamp value as the last valid date, rather than NULL. 
> (e.g. UNIX_TIMESTAMP('2001-02-31', '-MM-dd') gives 983354400, which 
> converts to '2001-02-28'. However, for calendar days larger than 31, e.g. 
> 2001-02-32, or 2023-04-32, UNIX_TIMESTAMP() would give NULL as a result.
> In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or 
> 0).
>  
> {noformat}
> 6: jdbc:hive2://localhost:10001/> select month, datetimestamp, 
> unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from 
> datetimetable;
> INFO  : Compiling 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
> select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
> timestampCol from datetimetable
> INFO  : No Stats for default@datetimetable, Columns: month, datetimestamp
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, 
> type:string, comment:null), FieldSchema(name:datetimestamp, type:string, 
> comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)], 
> properties:null)
> INFO  : Completed compiling 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); 
> Time taken: 0.102 seconds
> INFO  : Operation QUERY obtained 0 locks
> INFO  : Executing 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
> select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
> timestampCol from datetimetable
> INFO  : Completed executing 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); 
> Time taken: 0.0 seconds
> +++---+
> | month  | datetimestamp  | timestampcol  |
> +++---+
> | Feb    | 2001-02-28     | 983318400     |
> | Feb    | 2001-02-29     | 983318400     |
> | Feb    | 2001-02-30     | 983318400     |
> | Feb    | 2001-02-31     | 983318400     |
> | Feb    | 2001-02-32     | NULL          |
> +++---+
> 5 rows selected (0.131 seconds){noformat}
>  
>  
> It looks like 
> [InstantDateTimeFormatter.java#L52|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/InstantDateTimeFormatter.java#L52]
>   by default, the formatter has the SMART resolver style.
> According to java jdk : 
> https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/java/time/format/ResolverStyle.java#L103
>  
> {noformat}
>   /**
>      * Style to resolve dates and times strictly.
>      * 
>      * Using strict resolution will ensure that all parsed values are within
>      * the outer range of valid values for the field. Individual fields may
>      * be further processed for strictness.
>      * 
>      * For example, resolving year-month and day-of-month in the ISO calendar
>      * system using strict mode will ensure that the day-of-month is valid
>      * for the year-month, rejecting invalid values.
>      */
>     STRICT,
>     /**
>      * Style to resolve dates and times in a smart, or intelligent, manner.
>      * 
>      * Using smart resolution will perform the sensible default for each
>      * field, which may be the same as strict, the same as lenient, or a third
>      * behavior. Individual fields will interpret this differently.
>      * 
>      * For example, resolving year-month and day-of-month in the ISO calendar
>      * system using smart mode will ensure that the day-of-month is from
>      * 1 to 31, converting any value beyond the last valid day-of-month to be
>      * the last valid day-of-month.
>      */
>     SMART,{noformat}
>  
>  
> Therefore, we should set the resolverStyle to STRICT to reject invalid date 
> values.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27772) UNIX_TIMESTAMP should return NULL when date fields are out of bounds

2023-10-16 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-27772.

Fix Version/s: 4.0.0
   Resolution: Fixed

Fixed in 
https://github.com/apache/hive/commit/1c126d947448ffc9784a1465306e018ba183a014. 
Thanks for the PR [~simhadri-g]!

[~simhadri-g] If you need to update the description here based on the last 
changes that were merged please do so. 

Please update the [Hive 
wiki|https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Datetime]
 with the new configuration property introduced here.

> UNIX_TIMESTAMP should return NULL when date fields are out of bounds
> 
>
> Key: HIVE-27772
> URL: https://issues.apache.org/jira/browse/HIVE-27772
> Project: Hive
>  Issue Type: Bug
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is 
> giving out the timestamp value as the last valid date, rather than NULL. 
> (e.g. UNIX_TIMESTAMP('2001-02-31', '-MM-dd') gives 983354400, which 
> converts to '2001-02-28'. However, for calendar days larger than 31, e.g. 
> 2001-02-32, or 2023-04-32, UNIX_TIMESTAMP() would give NULL as a result.
> In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or 
> 0).
>  
> {noformat}
> 6: jdbc:hive2://localhost:10001/> select month, datetimestamp, 
> unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from 
> datetimetable;
> INFO  : Compiling 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
> select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
> timestampCol from datetimetable
> INFO  : No Stats for default@datetimetable, Columns: month, datetimestamp
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, 
> type:string, comment:null), FieldSchema(name:datetimestamp, type:string, 
> comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)], 
> properties:null)
> INFO  : Completed compiling 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); 
> Time taken: 0.102 seconds
> INFO  : Operation QUERY obtained 0 locks
> INFO  : Executing 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
> select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
> timestampCol from datetimetable
> INFO  : Completed executing 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); 
> Time taken: 0.0 seconds
> +++---+
> | month  | datetimestamp  | timestampcol  |
> +++---+
> | Feb    | 2001-02-28     | 983318400     |
> | Feb    | 2001-02-29     | 983318400     |
> | Feb    | 2001-02-30     | 983318400     |
> | Feb    | 2001-02-31     | 983318400     |
> | Feb    | 2001-02-32     | NULL          |
> +++---+
> 5 rows selected (0.131 seconds){noformat}
>  
>  
> It looks like 
> [InstantDateTimeFormatter.java#L52|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/InstantDateTimeFormatter.java#L52]
>   by default, the formatter has the SMART resolver style.
> According to java jdk : 
> https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/java/time/format/ResolverStyle.java#L103
>  
> {noformat}
>   /**
>      * Style to resolve dates and times strictly.
>      * 
>      * Using strict resolution will ensure that all parsed values are within
>      * the outer range of valid values for the field. Individual fields may
>      * be further processed for strictness.
>      * 
>      * For example, resolving year-month and day-of-month in the ISO calendar
>      * system using strict mode will ensure that the day-of-month is valid
>      * for the year-month, rejecting invalid values.
>      */
>     STRICT,
>     /**
>      * Style to resolve dates and times in a smart, or intelligent, manner.
>      * 
>      * Using smart resolution will perform the sensible default for each
>      * field, which may be the same as strict, the same as lenient, or a third
>      * behavior. Individual fields will interpret this differently.
>      * 
>      * For example, resolving year-month and day-of-month in the ISO calendar
>      * system using smart mode will ensure that the day-of-month is from
>      * 1 to 31, converting any value beyond the last valid day-of-month to be
>      * the last valid day-of-month.
>      */
>     SMART,{noformat}
>  
>  
> Therefore, we 

[jira] [Updated] (HIVE-27772) UNIX_TIMESTAMP should return NULL when date fields are out of bounds

2023-10-16 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-27772:
---
Summary: UNIX_TIMESTAMP should return NULL when date fields are out of 
bounds  (was: Hive UNIX_TIMESTAMP()should return null for invalid dates)

> UNIX_TIMESTAMP should return NULL when date fields are out of bounds
> 
>
> Key: HIVE-27772
> URL: https://issues.apache.org/jira/browse/HIVE-27772
> Project: Hive
>  Issue Type: Bug
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>
> For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is 
> giving out the timestamp value as the last valid date, rather than NULL. 
> (e.g. UNIX_TIMESTAMP('2001-02-31', '-MM-dd') gives 983354400, which 
> converts to '2001-02-28'. However, for calendar days larger than 31, e.g. 
> 2001-02-32, or 2023-04-32, UNIX_TIMESTAMP() would give NULL as a result.
> In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or 
> 0).
>  
> {noformat}
> 6: jdbc:hive2://localhost:10001/> select month, datetimestamp, 
> unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from 
> datetimetable;
> INFO  : Compiling 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
> select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
> timestampCol from datetimetable
> INFO  : No Stats for default@datetimetable, Columns: month, datetimestamp
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, 
> type:string, comment:null), FieldSchema(name:datetimestamp, type:string, 
> comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)], 
> properties:null)
> INFO  : Completed compiling 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); 
> Time taken: 0.102 seconds
> INFO  : Operation QUERY obtained 0 locks
> INFO  : Executing 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
> select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
> timestampCol from datetimetable
> INFO  : Completed executing 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); 
> Time taken: 0.0 seconds
> +++---+
> | month  | datetimestamp  | timestampcol  |
> +++---+
> | Feb    | 2001-02-28     | 983318400     |
> | Feb    | 2001-02-29     | 983318400     |
> | Feb    | 2001-02-30     | 983318400     |
> | Feb    | 2001-02-31     | 983318400     |
> | Feb    | 2001-02-32     | NULL          |
> +++---+
> 5 rows selected (0.131 seconds){noformat}
>  
>  
> It looks like 
> [InstantDateTimeFormatter.java#L52|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/InstantDateTimeFormatter.java#L52]
>   by default, the formatter has the SMART resolver style.
> According to java jdk : 
> https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/java/time/format/ResolverStyle.java#L103
>  
> {noformat}
>   /**
>      * Style to resolve dates and times strictly.
>      * 
>      * Using strict resolution will ensure that all parsed values are within
>      * the outer range of valid values for the field. Individual fields may
>      * be further processed for strictness.
>      * 
>      * For example, resolving year-month and day-of-month in the ISO calendar
>      * system using strict mode will ensure that the day-of-month is valid
>      * for the year-month, rejecting invalid values.
>      */
>     STRICT,
>     /**
>      * Style to resolve dates and times in a smart, or intelligent, manner.
>      * 
>      * Using smart resolution will perform the sensible default for each
>      * field, which may be the same as strict, the same as lenient, or a third
>      * behavior. Individual fields will interpret this differently.
>      * 
>      * For example, resolving year-month and day-of-month in the ISO calendar
>      * system using smart mode will ensure that the day-of-month is from
>      * 1 to 31, converting any value beyond the last valid day-of-month to be
>      * the last valid day-of-month.
>      */
>     SMART,{noformat}
>  
>  
> Therefore, we should set the resolverStyle to STRICT to reject invalid date 
> values.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HIVE-27802) Simplify TestTezSessionState.testSymlinkedLocalFilesAreLocalizedOnce

2023-10-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-27802 started by László Bodor.
---
> Simplify TestTezSessionState.testSymlinkedLocalFilesAreLocalizedOnce
> 
>
> Key: HIVE-27802
> URL: https://issues.apache.org/jira/browse/HIVE-27802
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>
> The unit test added in HIVE-27723 is overcomplicated: there is no need to 
> mock a session pool manager to get a TezSessionState.
> We need to simply instantiate TezSessionState directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27802) Simplify TestTezSessionState.testSymlinkedLocalFilesAreLocalizedOnce

2023-10-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27802:
--
Labels: pull-request-available  (was: )

> Simplify TestTezSessionState.testSymlinkedLocalFilesAreLocalizedOnce
> 
>
> Key: HIVE-27802
> URL: https://issues.apache.org/jira/browse/HIVE-27802
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>
> The unit test added in HIVE-27723 is overcomplicated: there is no need to 
> mock a session pool manager to get a TezSessionState.
> We need to simply instantiate TezSessionState directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27802) Simplify TestTezSessionState.testSymlinkedLocalFilesAreLocalizedOnce

2023-10-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-27802:

Description: 
The unit test added in HIVE-27723 is overcomplicated: there is no need to mock 
a session pool manager to get a TezSessionState.
We need to simply instantiate TezSessionState directly.

> Simplify TestTezSessionState.testSymlinkedLocalFilesAreLocalizedOnce
> 
>
> Key: HIVE-27802
> URL: https://issues.apache.org/jira/browse/HIVE-27802
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> The unit test added in HIVE-27723 is overcomplicated: there is no need to 
> mock a session pool manager to get a TezSessionState.
> We need to simply instantiate TezSessionState directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27802) Simplify TestTezSessionState.testSymlinkedLocalFilesAreLocalizedOnce

2023-10-16 Thread Jira
László Bodor created HIVE-27802:
---

 Summary: Simplify 
TestTezSessionState.testSymlinkedLocalFilesAreLocalizedOnce
 Key: HIVE-27802
 URL: https://issues.apache.org/jira/browse/HIVE-27802
 Project: Hive
  Issue Type: Improvement
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27802) Simplify TestTezSessionState.testSymlinkedLocalFilesAreLocalizedOnce

2023-10-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor reassigned HIVE-27802:
---

Assignee: László Bodor

> Simplify TestTezSessionState.testSymlinkedLocalFilesAreLocalizedOnce
> 
>
> Key: HIVE-27802
> URL: https://issues.apache.org/jira/browse/HIVE-27802
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-20605) merge master-tez092 branch into master

2023-10-16 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-20605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775695#comment-17775695
 ] 

László Bodor commented on HIVE-20605:
-

just a heads up, I found that HIVE-20547 still not merged back to master

> merge master-tez092 branch into master
> --
>
> Key: HIVE-20605
> URL: https://issues.apache.org/jira/browse/HIVE-20605
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>
> I got tired of waiting for Tez 0.92 release (it's been pending for half a 
> year) so I created a branch to prevent various patches from conflicting with 
> each other.
> This jira is to merge them into master after Tez 0.92 is finally released.
> The jiras here: 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20HIVE%20AND%20fixVersion%20%3D%20master-tez092
>  should then be updated with the corresponding Hive release version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HIVE-27276) Enable debug options

2023-10-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-27276 started by Zoltán Rátkai.

> Enable debug options
> 
>
> Key: HIVE-27276
> URL: https://issues.apache.org/jira/browse/HIVE-27276
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Assignee: Zoltán Rátkai
>Priority: Major
>
> {code}
> -p9866:9866 -p1:1 -p10001:10001 -p9000:9000 -p8000:8000 -p3306:3306 
> -p50070:50070 -p50030:50030
> {code}
> For debug purpose, you can launch the container with:
> docker run -d -p 9083:9083 -p 8009:8009 --env 
> SERVICE_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=8009"
>  --env SERVICE_NAME=metastore --name metastore-standalone 
> apache/hive:4.0.0-SNAPSHOT
> SERVICE_OPTS will finally propagate to the JVM args of the service.
> PR for testing it out, #4240



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27276) Enable debug options

2023-10-16 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-27276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775611#comment-17775611
 ] 

Zoltán Rátkai commented on HIVE-27276:
--

Tested debugging with adding SERVICE_OPTS,works.

> Enable debug options
> 
>
> Key: HIVE-27276
> URL: https://issues.apache.org/jira/browse/HIVE-27276
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Assignee: Zoltán Rátkai
>Priority: Major
>
> {code}
> -p9866:9866 -p1:1 -p10001:10001 -p9000:9000 -p8000:8000 -p3306:3306 
> -p50070:50070 -p50030:50030
> {code}
> For debug purpose, you can launch the container with:
> docker run -d -p 9083:9083 -p 8009:8009 --env 
> SERVICE_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=8009"
>  --env SERVICE_NAME=metastore --name metastore-standalone 
> apache/hive:4.0.0-SNAPSHOT
> SERVICE_OPTS will finally propagate to the JVM args of the service.
> PR for testing it out, #4240



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27276) Enable debug options

2023-10-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Rátkai resolved HIVE-27276.
--
Resolution: Fixed

> Enable debug options
> 
>
> Key: HIVE-27276
> URL: https://issues.apache.org/jira/browse/HIVE-27276
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Assignee: Zoltán Rátkai
>Priority: Major
>
> {code}
> -p9866:9866 -p1:1 -p10001:10001 -p9000:9000 -p8000:8000 -p3306:3306 
> -p50070:50070 -p50030:50030
> {code}
> For debug purpose, you can launch the container with:
> docker run -d -p 9083:9083 -p 8009:8009 --env 
> SERVICE_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=8009"
>  --env SERVICE_NAME=metastore --name metastore-standalone 
> apache/hive:4.0.0-SNAPSHOT
> SERVICE_OPTS will finally propagate to the JVM args of the service.
> PR for testing it out, #4240



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27686) Use ORC 1.8.5.

2023-10-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-27686:

Fix Version/s: 4.0.0

> Use ORC 1.8.5.
> --
>
> Key: HIVE-27686
> URL: https://issues.apache.org/jira/browse/HIVE-27686
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltán Rátkai
>Assignee: Zoltán Rátkai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> ORC-1413 fixed a bug to use ORC row level filter, it was released in ORC 
> 1.8.4, so use the latest from 1.8.x



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27686) Use ORC 1.8.5.

2023-10-16 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-27686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775605#comment-17775605
 ] 

László Bodor commented on HIVE-27686:
-

merged to master, thanks [~zratkai] for the patch and [~InvisibleProgrammer], 
[~aturoczy], [~zhangbutao] for the reviews!

> Use ORC 1.8.5.
> --
>
> Key: HIVE-27686
> URL: https://issues.apache.org/jira/browse/HIVE-27686
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltán Rátkai
>Assignee: Zoltán Rátkai
>Priority: Major
>  Labels: pull-request-available
>
> ORC-1413 fixed a bug to use ORC row level filter, it was released in ORC 
> 1.8.4, so use the latest from 1.8.x



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27686) Use ORC 1.8.5.

2023-10-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor resolved HIVE-27686.
-
Resolution: Fixed

> Use ORC 1.8.5.
> --
>
> Key: HIVE-27686
> URL: https://issues.apache.org/jira/browse/HIVE-27686
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltán Rátkai
>Assignee: Zoltán Rátkai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> ORC-1413 fixed a bug to use ORC row level filter, it was released in ORC 
> 1.8.4, so use the latest from 1.8.x



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27801) Exists subquery rewrite into LEFT SEMI JOIN produce incorrect plan

2023-10-16 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27801:
--
Description: 
reproduce (no rows should be returned):
{code}
set hive.explain.user=false;
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.stats.autogather=false;
set hive.exec.dynamic.partition.mode=nonstrict;

drop table if exists store_sales;

create table store_sales (ss_promo_sk int, ss_sales_price int, ss_list_price 
int) stored as orc tblproperties('transactional'='true');
insert into store_sales values (1, 20, 15), (1, 15, 20), (1, 10, 15);

explain cbo 
select * from store_sales A where exists ( 
select 1 from store_sales B 
where a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_price Exists subquery rewrite into LEFT SEMI JOIN produce incorrect plan
> --
>
> Key: HIVE-27801
> URL: https://issues.apache.org/jira/browse/HIVE-27801
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Priority: Major
> Attachments: Screenshot 2023-10-10 at 20.14.03.png
>
>
> reproduce (no rows should be returned):
> {code}
> set hive.explain.user=false;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.stats.autogather=false;
> set hive.exec.dynamic.partition.mode=nonstrict;
> drop table if exists store_sales;
> create table store_sales (ss_promo_sk int, ss_sales_price int, ss_list_price 
> int) stored as orc tblproperties('transactional'='true');
> insert into store_sales values (1, 20, 15), (1, 15, 20), (1, 10, 15);
> explain cbo 
> select * from store_sales A where exists ( 
> select 1 from store_sales B 
> where a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price 
> and A.ss_sales_price 
> select * from store_sales A where exists( 
> select 1 from store_sales B 
> where A.ss_promo_sk=B.ss_promo_sk and A.ss_sales_price>B.ss_list_price 
> and A.ss_sales_price 
> explain cbo
> select * from store_sales A 
> LEFT SEMI JOIN store_sales B 
> ON a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
> A.ss_sales_price select * from store_sales A 
> LEFT SEMI JOIN store_sales B 
> ON a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
> A.ss_sales_price {code}
> plan diff:
> [^Screenshot 2023-10-10 at 20.14.03.png]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27801) Exists subquery rewrite into LEFT SEMI JOIN produce incorrect plan

2023-10-16 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27801:
--
Description: 
reproduce (no rows should be returned):
{code}
set hive.explain.user=false;
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.stats.autogather=false;
set hive.exec.dynamic.partition.mode=nonstrict;

drop table if exists store_sales;

create table store_sales (ss_promo_sk int, ss_sales_price int, ss_list_price 
int) stored as orc tblproperties('transactional'='true');
insert into store_sales values (1, 20, 15), (1, 15, 20), (1, 10, 15);

explain cbo 
select * from store_sales A where exists ( 
select 1 from store_sales B 
where a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_price Exists subquery rewrite into LEFT SEMI JOIN produce incorrect plan
> --
>
> Key: HIVE-27801
> URL: https://issues.apache.org/jira/browse/HIVE-27801
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Priority: Major
> Attachments: Screenshot 2023-10-10 at 20.14.03.png
>
>
> reproduce (no rows should be returned):
> {code}
> set hive.explain.user=false;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.stats.autogather=false;
> set hive.exec.dynamic.partition.mode=nonstrict;
> drop table if exists store_sales;
> create table store_sales (ss_promo_sk int, ss_sales_price int, ss_list_price 
> int) stored as orc tblproperties('transactional'='true');
> insert into store_sales values (1, 20, 15), (1, 15, 20), (1, 10, 15);
> explain cbo 
> select * from store_sales A where exists ( 
> select 1 from store_sales B 
> where a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price 
> and A.ss_sales_price 
> select * from store_sales A where exists( 
> select 1 from store_sales B 
> where A.ss_promo_sk=B.ss_promo_sk and A.ss_sales_price>B.ss_list_price 
> and A.ss_sales_price 
> explain cbo
> select * from store_sales A 
> LEFT SEMI JOIN store_sales B 
> ON a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
> A.ss_sales_price select * from store_sales A 
> LEFT SEMI JOIN store_sales B 
> ON a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
> A.ss_sales_price {code}
> Plans diff:
> [^Screenshot 2023-10-10 at 20.14.03.png]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27801) Exists subquery rewrite into LEFT SEMI JOIN produce incorrect plan

2023-10-16 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27801:
--
Description: 
reproduce (no rows should be returned):
{code}
set hive.explain.user=false;
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.stats.autogather=false;
set hive.exec.dynamic.partition.mode=nonstrict;

drop table if exists store_sales;

create table store_sales (ss_promo_sk int, ss_sales_price int, ss_list_price 
int) stored as orc tblproperties('transactional'='true');
insert into store_sales values (1, 20, 15), (1, 15, 20), (1, 10, 15);

explain cbo 
select * from store_sales A where exists ( 
select 1 from store_sales B 
where a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_price Exists subquery rewrite into LEFT SEMI JOIN produce incorrect plan
> --
>
> Key: HIVE-27801
> URL: https://issues.apache.org/jira/browse/HIVE-27801
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Priority: Major
> Attachments: Screenshot 2023-10-10 at 20.14.03.png
>
>
> reproduce (no rows should be returned):
> {code}
> set hive.explain.user=false;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.stats.autogather=false;
> set hive.exec.dynamic.partition.mode=nonstrict;
> drop table if exists store_sales;
> create table store_sales (ss_promo_sk int, ss_sales_price int, ss_list_price 
> int) stored as orc tblproperties('transactional'='true');
> insert into store_sales values (1, 20, 15), (1, 15, 20), (1, 10, 15);
> explain cbo 
> select * from store_sales A where exists ( 
> select 1 from store_sales B 
> where a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price 
> and A.ss_sales_price 
> select * from store_sales A where exists( 
> select 1 from store_sales B 
> where A.ss_promo_sk=B.ss_promo_sk and A.ss_sales_price>B.ss_list_price 
> and A.ss_sales_price 
> explain cbo
> select * from store_sales A 
> LEFT SEMI JOIN store_sales B 
> ON a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
> A.ss_sales_price select * from store_sales A 
> LEFT SEMI JOIN store_sales B 
> ON a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
> A.ss_sales_price {code}
> plans diff:
> [^Screenshot 2023-10-10 at 20.14.03.png]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27801) Exists subquery rewrite into LEFT SEMI JOIN produce incorrect plan

2023-10-16 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27801:
--
Description: 
reproduce:
{code}
set hive.explain.user=false;
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.stats.autogather=false;
set hive.exec.dynamic.partition.mode=nonstrict;

drop table if exists store_sales;

create table store_sales (ss_promo_sk int, ss_sales_price int, ss_list_price 
int) stored as orc tblproperties('transactional'='true');
insert into store_sales values (1, 20, 15), (1, 15, 20), (1, 10, 15);

explain cbo 
select * from store_sales A where exists ( 
select 1 from store_sales B 
where a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_price Exists subquery rewrite into LEFT SEMI JOIN produce incorrect plan
> --
>
> Key: HIVE-27801
> URL: https://issues.apache.org/jira/browse/HIVE-27801
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Priority: Major
> Attachments: Screenshot 2023-10-10 at 20.14.03.png
>
>
> reproduce:
> {code}
> set hive.explain.user=false;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.stats.autogather=false;
> set hive.exec.dynamic.partition.mode=nonstrict;
> drop table if exists store_sales;
> create table store_sales (ss_promo_sk int, ss_sales_price int, ss_list_price 
> int) stored as orc tblproperties('transactional'='true');
> insert into store_sales values (1, 20, 15), (1, 15, 20), (1, 10, 15);
> explain cbo 
> select * from store_sales A where exists ( 
> select 1 from store_sales B 
> where a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price 
> and A.ss_sales_price 
> select * from store_sales A where exists( 
> select 1 from store_sales B 
> where A.ss_promo_sk=B.ss_promo_sk and A.ss_sales_price>B.ss_list_price 
> and A.ss_sales_price 
> explain cbo
> select * from store_sales A 
> LEFT SEMI JOIN store_sales B 
> ON a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
> A.ss_sales_price select * from store_sales A 
> LEFT SEMI JOIN store_sales B 
> ON a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
> A.ss_sales_price {code}
> Plans diff:
> [^Screenshot 2023-10-10 at 20.14.03.png]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27801) Exists subquery rewrite into LEFT SEMI JOIN produce incorrect plan

2023-10-16 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27801:
--
Description: 
reproduce:
{code}
set hive.explain.user=false;
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.stats.autogather=false;
set hive.exec.dynamic.partition.mode=nonstrict;

drop table if exists store_sales;

create table store_sales (ss_promo_sk int, ss_sales_price int, ss_list_price 
int) stored as orc tblproperties('transactional'='true');
insert into store_sales values (1, 20, 15), (1, 15, 20), (1, 10, 15);

explain cbo 
select * from store_sales A where exists ( 
select 1 from store_sales B 
where a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_price Exists subquery rewrite into LEFT SEMI JOIN produce incorrect plan
> --
>
> Key: HIVE-27801
> URL: https://issues.apache.org/jira/browse/HIVE-27801
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Priority: Major
> Attachments: Screenshot 2023-10-10 at 20.14.03.png
>
>
> reproduce:
> {code}
> set hive.explain.user=false;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.stats.autogather=false;
> set hive.exec.dynamic.partition.mode=nonstrict;
> drop table if exists store_sales;
> create table store_sales (ss_promo_sk int, ss_sales_price int, ss_list_price 
> int) stored as orc tblproperties('transactional'='true');
> insert into store_sales values (1, 20, 15), (1, 15, 20), (1, 10, 15);
> explain cbo 
> select * from store_sales A where exists ( 
> select 1 from store_sales B 
> where a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price 
> and A.ss_sales_price 
> select * from store_sales A where exists( 
> select 1 from store_sales B 
> where A.ss_promo_sk=B.ss_promo_sk and A.ss_sales_price>B.ss_list_price 
> and A.ss_sales_price 
> explain cbo
> select * from store_sales A 
> LEFT SEMI JOIN store_sales B 
> ON a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
> A.ss_sales_price select * from store_sales A 
> LEFT SEMI JOIN store_sales B 
> ON a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
> A.ss_sales_price {code}
> [^Screenshot 2023-10-10 at 20.14.03.png]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27801) Exists subquery rewrite into LEFT SEMI JOIN produce incorrect plan

2023-10-16 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27801:
--
Description: 
reproduce:
{code}
set hive.explain.user=false;
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.stats.autogather=false;
set hive.exec.dynamic.partition.mode=nonstrict;

drop table if exists store_sales;

create table store_sales (ss_promo_sk int, ss_sales_price int, ss_list_price 
int) stored as orc tblproperties('transactional'='true');
insert into store_sales values (1, 20, 15), (1, 15, 20), (1, 10, 15);

explain cbo 
select * from store_sales A where exists ( 
select 1 from store_sales B 
where a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_price Exists subquery rewrite into LEFT SEMI JOIN produce incorrect plan
> --
>
> Key: HIVE-27801
> URL: https://issues.apache.org/jira/browse/HIVE-27801
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Priority: Major
> Attachments: Screenshot 2023-10-10 at 20.14.03.png
>
>
> reproduce:
> {code}
> set hive.explain.user=false;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.stats.autogather=false;
> set hive.exec.dynamic.partition.mode=nonstrict;
> drop table if exists store_sales;
> create table store_sales (ss_promo_sk int, ss_sales_price int, ss_list_price 
> int) stored as orc tblproperties('transactional'='true');
> insert into store_sales values (1, 20, 15), (1, 15, 20), (1, 10, 15);
> explain cbo 
> select * from store_sales A where exists ( 
> select 1 from store_sales B 
> where a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price 
> and A.ss_sales_price 
> select * from store_sales A where exists( 
> select 1 from store_sales B 
> where A.ss_promo_sk=B.ss_promo_sk and A.ss_sales_price>B.ss_list_price 
> and A.ss_sales_price 
> explain cbo
> select * from store_sales A 
> LEFT SEMI JOIN store_sales B 
> ON a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
> A.ss_sales_price select * from store_sales A 
> LEFT SEMI JOIN store_sales B 
> ON a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
> A.ss_sales_price {code}
> [^]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27801) Exists subquery rewrite into LEFT SEMI JOIN produce incorrect plan

2023-10-16 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27801:
--
Attachment: Screenshot 2023-10-10 at 20.14.03.png

> Exists subquery rewrite into LEFT SEMI JOIN produce incorrect plan
> --
>
> Key: HIVE-27801
> URL: https://issues.apache.org/jira/browse/HIVE-27801
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Priority: Major
> Attachments: Screenshot 2023-10-10 at 20.14.03.png
>
>
> reproduce:
> {code}
> set hive.explain.user=false;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.stats.autogather=false;
> set hive.exec.dynamic.partition.mode=nonstrict;
> drop table if exists store_sales;
> create table store_sales (ss_promo_sk int, ss_sales_price int, ss_list_price 
> int) stored as orc tblproperties('transactional'='true');
> insert into store_sales values (1, 20, 15), (1, 15, 20), (1, 10, 15);
> explain cbo 
> select * from store_sales A where exists ( 
> select 1 from store_sales B 
> where a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price 
> and A.ss_sales_price 
> select * from store_sales A where exists( 
> select 1 from store_sales B 
> where A.ss_promo_sk=B.ss_promo_sk and A.ss_sales_price>B.ss_list_price 
> and A.ss_sales_price 
> explain cbo
> select * from store_sales A 
> LEFT SEMI JOIN store_sales B 
> ON a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
> A.ss_sales_price select * from store_sales A 
> LEFT SEMI JOIN store_sales B 
> ON a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
> A.ss_sales_price {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27801) Exists subquery rewrite into LEFT SEMI JOIN produce incorrect plan

2023-10-16 Thread Denys Kuzmenko (Jira)
Denys Kuzmenko created HIVE-27801:
-

 Summary: Exists subquery rewrite into LEFT SEMI JOIN produce 
incorrect plan
 Key: HIVE-27801
 URL: https://issues.apache.org/jira/browse/HIVE-27801
 Project: Hive
  Issue Type: Bug
Reporter: Denys Kuzmenko


reproduce:
{code}
set hive.explain.user=false;
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.stats.autogather=false;
set hive.exec.dynamic.partition.mode=nonstrict;

drop table if exists store_sales;

create table store_sales (ss_promo_sk int, ss_sales_price int, ss_list_price 
int) stored as orc tblproperties('transactional'='true');
insert into store_sales values (1, 20, 15), (1, 15, 20), (1, 10, 15);

explain cbo 
select * from store_sales A where exists ( 
select 1 from store_sales B 
where a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_price

[jira] [Commented] (HIVE-27798) Correct configuration item in hive-site.xml in docker.

2023-10-16 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775593#comment-17775593
 ] 

Ayush Saxena commented on HIVE-27798:
-

Committed to master.

Thanx [~xiaolin84250] for the contribution!!!

> Correct configuration item in hive-site.xml in docker.
> --
>
> Key: HIVE-27798
> URL: https://issues.apache.org/jira/browse/HIVE-27798
> Project: Hive
>  Issue Type: Bug
> Environment: docker
>Reporter: 易霖威
>Assignee: 易霖威
>Priority: Critical
>  Labels: conf, config, docker, properties, pull-request-available
> Attachments: image-2023-10-14-09-16-41-648.png, 
> image-2023-10-14-09-17-47-281.png
>
>
> hive.metastore.warehouse.dir, this configuration item is configured 
> incorrectly, causing the configuration item to not take effect.
> bug image:
> !image-2023-10-14-09-16-41-648.png!
> !https://user-images.githubusercontent.com/38107489/274453211-f2a28ee6-16b2-44fb-8ad2-a4736fb21104.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27798) Correct configuration item in hive-site.xml in docker.

2023-10-16 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HIVE-27798.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> Correct configuration item in hive-site.xml in docker.
> --
>
> Key: HIVE-27798
> URL: https://issues.apache.org/jira/browse/HIVE-27798
> Project: Hive
>  Issue Type: Bug
> Environment: docker
>Reporter: 易霖威
>Assignee: 易霖威
>Priority: Critical
>  Labels: conf, config, docker, properties, pull-request-available
> Fix For: 4.0.0
>
> Attachments: image-2023-10-14-09-16-41-648.png, 
> image-2023-10-14-09-17-47-281.png
>
>
> hive.metastore.warehouse.dir, this configuration item is configured 
> incorrectly, causing the configuration item to not take effect.
> bug image:
> !image-2023-10-14-09-16-41-648.png!
> !https://user-images.githubusercontent.com/38107489/274453211-f2a28ee6-16b2-44fb-8ad2-a4736fb21104.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27798) Correct configuration item in hive-site.xml in docker.

2023-10-16 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-27798:

Priority: Major  (was: Critical)

> Correct configuration item in hive-site.xml in docker.
> --
>
> Key: HIVE-27798
> URL: https://issues.apache.org/jira/browse/HIVE-27798
> Project: Hive
>  Issue Type: Bug
> Environment: docker
>Reporter: 易霖威
>Assignee: 易霖威
>Priority: Major
>  Labels: conf, config, docker, properties, pull-request-available
> Fix For: 4.0.0
>
> Attachments: image-2023-10-14-09-16-41-648.png, 
> image-2023-10-14-09-17-47-281.png
>
>
> hive.metastore.warehouse.dir, this configuration item is configured 
> incorrectly, causing the configuration item to not take effect.
> bug image:
> !image-2023-10-14-09-16-41-648.png!
> !https://user-images.githubusercontent.com/38107489/274453211-f2a28ee6-16b2-44fb-8ad2-a4736fb21104.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26828) Fix OOM for hybridgrace_hashjoin_2.q

2023-10-16 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-26828.

Fix Version/s: 4.0.0
   Resolution: Fixed

Fixed in 
https://github.com/apache/hive/commit/c126422a91be695c75ec4a750638a0aa4d1ba6cd.

> Fix OOM for hybridgrace_hashjoin_2.q
> 
>
> Key: HIVE-26828
> URL: https://issues.apache.org/jira/browse/HIVE-26828
> Project: Hive
>  Issue Type: Bug
>  Components: Test, Tez
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Stamatis Zampetakis
>Priority: Major
> Fix For: 4.0.0
>
>
> _hybridgrace_hashjoin_2.q_ test was disabled because it was failing with OOM 
> transiently (from [flaky_test 
> output|http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests/],
>  in case it disappears):
> {quote}< Status: Failed
> < Vertex failed, vertexName=Map 2, vertexId=vertex_#ID#, diagnostics=[Vertex 
> vertex_#ID# [Map 2] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex 
> Input: z1 initializer failed, vertex=vertex_#ID# [Map 2], 
> java.lang.RuntimeException: Failed to load plan: 
> hdfs://localhost:45033/home/jenkins/agent/workspace/hive-flaky-check/itests/qtest/target/tmp/scratchdir/jenkins/88f705a8-2d67-4d0a-92fd-d9617faf4e46/hive_2022-12-08_02-25-15_569_4666093830564098399-1/jenkins/_tez_scratch_dir/5b786380-b362-45e0-ac10-0f835ef1d8d7/map.xml
> <  A masked pattern was here 
> < Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> < Serialization trace:
> < childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator)
> < childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> < aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> <  A masked pattern was here 
> < Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
> <  A masked pattern was here 
> < ]
> < [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
> < [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
> < [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
> < [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
> < [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
> < DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:5
> < FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 2, 
> vertexId=vertex_#ID#, diagnostics=[Vertex vertex_#ID# [Map 2] killed/failed 
> due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: z1 initializer failed, 
> vertex=vertex_#ID# [Map 2], java.lang.RuntimeException: Failed to load plan: 
> hdfs://localhost:45033/home/jenkins/agent/workspace/hive-flaky-check/itests/qtest/target/tmp/scratchdir/jenkins/88f705a8-2d67-4d0a-92fd-d9617faf4e46/hive_2022-12-08_02-25-15_569_4666093830564098399-1/jenkins/_tez_scratch_dir/5b786380-b362-45e0-ac10-0f835ef1d8d7/map.xml
> <  A masked pattern was here 
> < Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> < Serialization trace:
> < childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator)
> < childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> < aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> <  A masked pattern was here 
> < Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
> <  A masked pattern was here 
> < ][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked Vertex killed 
> due to OTHER_VERTEX_FAILURE][Masked Vertex killed due to 
> OTHER_VERTEX_FAILURE][Masked Vertex killed due to 
> OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE]DAG 
> did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:5
> < PREHOOK: query: SELECT COUNT( * )
> < FROM src1 x
> < JOIN srcpart z1 ON (x.key = z1.key)
> < JOIN src y1 ON (x.key = y1.key)
> < JOIN srcpart z2 ON (x.value = z2.value)
> < JOIN src y2 ON (x.value = y2.value)
> < WHERE z1.key < '' AND z2.key < 'zz'
> < AND y1.value < '' AND y2.value < 'zz'
> < PREHOOK: type: QUERY
> < PREHOOK: Input: default@src
> < PREHOOK: Input: default@src1
> < PREHOOK: Input: default@srcpart
> < PREHOOK: Input: default@srcpart@ds=2008-04-08/hr=11
> < PREHOOK: Input: default@srcpart@ds=2008-04-08/hr=12
> < PREHOOK: Input: default@srcpart@ds=2008-04-09/hr=11
> < PREHOOK: Input: default@srcpart@ds=2008-04-09/hr=12
> < PREHOOK: Output: hdfs://### HDFS PATH ###
> {quote}
> The aim of this ticket is to investigate the issue, fix it and re-enable the 
> test.
> The problem seems to lie in the deserialization of the computed tez dag plan.



--
This message 

[jira] [Resolved] (HIVE-27182) tez_union_with_udf.q with TestMiniTezCliDriver is flaky

2023-10-16 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-27182.

Fix Version/s: 4.0.0
   Resolution: Fixed

Fixed in 
https://github.com/apache/hive/commit/c126422a91be695c75ec4a750638a0aa4d1ba6cd.

> tez_union_with_udf.q with TestMiniTezCliDriver is flaky
> ---
>
> Key: HIVE-27182
> URL: https://issues.apache.org/jira/browse/HIVE-27182
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Stamatis Zampetakis
>Priority: Major
> Fix For: 4.0.0
>
>
> Looks like memory issue:
> {noformat}
> < Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> < Serialization trace:
> < genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> < colExprMap (org.apache.hadoop.hive.ql.plan.SelectDesc)
> < conf (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
> < childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator)
> < childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> {noformat}
> Ref: 
> http://ci.hive.apache.org/job/hive-precommit/job/PR-4155/2/testReport/junit/org.apache.hadoop.hive.cli/TestMiniTezCliDriver/Testing___split_20___PostProcess___testCliDriver_tez_union_with_udf_/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26828) Fix OOM for hybridgrace_hashjoin_2.q

2023-10-16 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-26828:
--

Assignee: Stamatis Zampetakis

> Fix OOM for hybridgrace_hashjoin_2.q
> 
>
> Key: HIVE-26828
> URL: https://issues.apache.org/jira/browse/HIVE-26828
> Project: Hive
>  Issue Type: Bug
>  Components: Test, Tez
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> _hybridgrace_hashjoin_2.q_ test was disabled because it was failing with OOM 
> transiently (from [flaky_test 
> output|http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests/],
>  in case it disappears):
> {quote}< Status: Failed
> < Vertex failed, vertexName=Map 2, vertexId=vertex_#ID#, diagnostics=[Vertex 
> vertex_#ID# [Map 2] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex 
> Input: z1 initializer failed, vertex=vertex_#ID# [Map 2], 
> java.lang.RuntimeException: Failed to load plan: 
> hdfs://localhost:45033/home/jenkins/agent/workspace/hive-flaky-check/itests/qtest/target/tmp/scratchdir/jenkins/88f705a8-2d67-4d0a-92fd-d9617faf4e46/hive_2022-12-08_02-25-15_569_4666093830564098399-1/jenkins/_tez_scratch_dir/5b786380-b362-45e0-ac10-0f835ef1d8d7/map.xml
> <  A masked pattern was here 
> < Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> < Serialization trace:
> < childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator)
> < childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> < aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> <  A masked pattern was here 
> < Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
> <  A masked pattern was here 
> < ]
> < [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
> < [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
> < [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
> < [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
> < [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
> < DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:5
> < FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 2, 
> vertexId=vertex_#ID#, diagnostics=[Vertex vertex_#ID# [Map 2] killed/failed 
> due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: z1 initializer failed, 
> vertex=vertex_#ID# [Map 2], java.lang.RuntimeException: Failed to load plan: 
> hdfs://localhost:45033/home/jenkins/agent/workspace/hive-flaky-check/itests/qtest/target/tmp/scratchdir/jenkins/88f705a8-2d67-4d0a-92fd-d9617faf4e46/hive_2022-12-08_02-25-15_569_4666093830564098399-1/jenkins/_tez_scratch_dir/5b786380-b362-45e0-ac10-0f835ef1d8d7/map.xml
> <  A masked pattern was here 
> < Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> < Serialization trace:
> < childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator)
> < childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> < aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> <  A masked pattern was here 
> < Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
> <  A masked pattern was here 
> < ][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked Vertex killed 
> due to OTHER_VERTEX_FAILURE][Masked Vertex killed due to 
> OTHER_VERTEX_FAILURE][Masked Vertex killed due to 
> OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE]DAG 
> did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:5
> < PREHOOK: query: SELECT COUNT( * )
> < FROM src1 x
> < JOIN srcpart z1 ON (x.key = z1.key)
> < JOIN src y1 ON (x.key = y1.key)
> < JOIN srcpart z2 ON (x.value = z2.value)
> < JOIN src y2 ON (x.value = y2.value)
> < WHERE z1.key < '' AND z2.key < 'zz'
> < AND y1.value < '' AND y2.value < 'zz'
> < PREHOOK: type: QUERY
> < PREHOOK: Input: default@src
> < PREHOOK: Input: default@src1
> < PREHOOK: Input: default@srcpart
> < PREHOOK: Input: default@srcpart@ds=2008-04-08/hr=11
> < PREHOOK: Input: default@srcpart@ds=2008-04-08/hr=12
> < PREHOOK: Input: default@srcpart@ds=2008-04-09/hr=11
> < PREHOOK: Input: default@srcpart@ds=2008-04-09/hr=12
> < PREHOOK: Output: hdfs://### HDFS PATH ###
> {quote}
> The aim of this ticket is to investigate the issue, fix it and re-enable the 
> test.
> The problem seems to lie in the deserialization of the computed tez dag plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27182) tez_union_with_udf.q with TestMiniTezCliDriver is flaky

2023-10-16 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-27182:
--

Assignee: Stamatis Zampetakis

> tez_union_with_udf.q with TestMiniTezCliDriver is flaky
> ---
>
> Key: HIVE-27182
> URL: https://issues.apache.org/jira/browse/HIVE-27182
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> Looks like memory issue:
> {noformat}
> < Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> < Serialization trace:
> < genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> < colExprMap (org.apache.hadoop.hive.ql.plan.SelectDesc)
> < conf (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
> < childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator)
> < childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> {noformat}
> Ref: 
> http://ci.hive.apache.org/job/hive-precommit/job/PR-4155/2/testReport/junit/org.apache.hadoop.hive.cli/TestMiniTezCliDriver/Testing___split_20___PostProcess___testCliDriver_tez_union_with_udf_/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27695) Intermittent OOM when running TestMiniTezCliDriver

2023-10-16 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-27695.

Fix Version/s: 4.0.0
   Resolution: Fixed

Fixed in 
https://github.com/apache/hive/commit/c126422a91be695c75ec4a750638a0aa4d1ba6cd. 
Thanks for the review [~ayushsaxena]!

> Intermittent OOM when running TestMiniTezCliDriver
> --
>
> Key: HIVE-27695
> URL: https://issues.apache.org/jira/browse/HIVE-27695
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: am_heap_dumps.tar.xz, dag_am_debug_bundles.tar.xz, 
> leak_suspect_1.png
>
>
> Running all the tests under TestMiniTezCliDriver very frequently (but still 
> intermittently) leads to OutOfMemory errors.
> {noformat}
> cd itests/qtest && mvn test -Dtest=TestMiniTezCliDriver
> {noformat}
> I set {{-XX:+HeapDumpOnOutOfMemoryError}} and the respective heapdumps are 
> attached to this ticket.
> The OOM is thrown from the application master and a quick inspection of the 
> dumps shows that it comes mainly from the accumulation of Configuration 
> objects (~1MB each) by various classes.
> The max heap size for application master is pretty low (~100MB) so it is 
> quite easy to reach. The heap size is explicitly very low for testing 
> purposes but maybe we should re-evaluate the current configurations for the 
> tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27798) Correct configuration item in hive-site.xml in docker.

2023-10-16 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-27798:

Summary: Correct configuration item in hive-site.xml in docker.  (was: 
There is an incorrect configuration item in hive-site.xml in docker.)

> Correct configuration item in hive-site.xml in docker.
> --
>
> Key: HIVE-27798
> URL: https://issues.apache.org/jira/browse/HIVE-27798
> Project: Hive
>  Issue Type: Bug
> Environment: docker
>Reporter: 易霖威
>Assignee: 易霖威
>Priority: Critical
>  Labels: conf, config, docker, properties, pull-request-available
> Attachments: image-2023-10-14-09-16-41-648.png, 
> image-2023-10-14-09-17-47-281.png
>
>
> hive.metastore.warehouse.dir, this configuration item is configured 
> incorrectly, causing the configuration item to not take effect.
> bug image:
> !image-2023-10-14-09-16-41-648.png!
> !https://user-images.githubusercontent.com/38107489/274453211-f2a28ee6-16b2-44fb-8ad2-a4736fb21104.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27784) Backport of HIVE-20364, HIVE-20549 to branch-3

2023-10-16 Thread Sankar Hariappan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-27784:

Affects Version/s: 3.1.3
   (was: 3.2.0)

> Backport of HIVE-20364, HIVE-20549 to branch-3
> --
>
> Key: HIVE-27784
> URL: https://issues.apache.org/jira/browse/HIVE-27784
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 3.1.3
>Reporter: Aman Raj
>Assignee: Aman Raj
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27784) Backport of HIVE-20364, HIVE-20549 to branch-3

2023-10-16 Thread Sankar Hariappan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan resolved HIVE-27784.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

> Backport of HIVE-20364, HIVE-20549 to branch-3
> --
>
> Key: HIVE-27784
> URL: https://issues.apache.org/jira/browse/HIVE-27784
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 3.2.0
>Reporter: Aman Raj
>Assignee: Aman Raj
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27795) Fix explainanalyze_2.q test

2023-10-16 Thread Sankar Hariappan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-27795:

Labels: release-3.2.0-blocker  (was: )

> Fix explainanalyze_2.q test
> ---
>
> Key: HIVE-27795
> URL: https://issues.apache.org/jira/browse/HIVE-27795
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Aman Raj
>Assignee: Aman Raj
>Priority: Major
>  Labels: release-3.2.0-blocker
> Fix For: 3.2.0
>
>
> Test failing with the following diff :
>  
> Client Execution succeeded but contained differences (error code = 1) after 
> executing explainanalyze_2.q 
> 2043c2043
> < default@srcpart,c_n3,Tbl:COMPLETE,Col:PARTIAL,Output:["key"]
> ---
> > default@srcpart,c_n3,Tbl:COMPLETE,Col:COMPLETE,Output:["key"]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)