[jira] [Commented] (HIVE-28524) Iceberg: Major QB Compaction add sort order support

2024-10-10 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17888238#comment-17888238
 ] 

Simhadri Govindappa commented on HIVE-28524:


Change merged to master.
Thanks [~difin] for the PR! 

> Iceberg: Major QB Compaction add sort order support
> ---
>
> Key: HIVE-28524
> URL: https://issues.apache.org/jira/browse/HIVE-28524
> Project: Hive
>  Issue Type: Improvement
>  Security Level: Public(Viewable by anyone) 
>  Components: Hive
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28561) Upgrade Hive 4.0

2024-10-07 Thread Simhadri Govindappa (Jira)
Simhadri Govindappa created HIVE-28561:
--

 Summary: Upgrade Hive 4.0
 Key: HIVE-28561
 URL: https://issues.apache.org/jira/browse/HIVE-28561
 Project: Hive
  Issue Type: Improvement
  Security Level: Public (Viewable by anyone)
Reporter: Simhadri Govindappa
Assignee: Simhadri Govindappa


Hive 4.0 has been released, we would like to upgrade the version of hive used 
in ranger to 4.0.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28373) Iceberg: Refactor the code of HadoopTableOptions

2024-09-11 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated HIVE-28373:
---
Fix Version/s: 4.1.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Iceberg: Refactor the code of HadoopTableOptions
> 
>
> Key: HIVE-28373
> URL: https://issues.apache.org/jira/browse/HIVE-28373
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0
>Reporter: yongzhi.shao
>Assignee: yongzhi.shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> Since there are a lot of problems with hadoop_catalog, we submitted the 
> following PR to the iceberg community: 
> [core:Refactor the code of HadoopTableOptions by BsoBird · Pull Request 
> #10623 · apache/iceberg 
> (github.com)|https://github.com/apache/iceberg/pull/10623]
> With this PR, we can implement atomic operations based on hadoopcatalog.
> But this PR is not accepted by the iceberg community.And it seems that the 
> iceberg community is trying to remove support for hadoopcatalog(only for 
> testing).
> Since hive itself supports a number of features based on the hadoop_catalog 
> table, can we merge this patch in hive?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28373) Fix HadoopCatalog based table

2024-09-11 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880929#comment-17880929
 ] 

Simhadri Govindappa commented on HIVE-28373:


Change has been merged to master. 

Thanks [~lisoda] for the PR!!

Thanks [~dkuzmenko] , [~ayushtkn] for the review!!

> Fix HadoopCatalog based table
> -
>
> Key: HIVE-28373
> URL: https://issues.apache.org/jira/browse/HIVE-28373
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0
>Reporter: yongzhi.shao
>Assignee: yongzhi.shao
>Priority: Major
>  Labels: pull-request-available
>
> Since there are a lot of problems with hadoop_catalog, we submitted the 
> following PR to the iceberg community: 
> [core:Refactor the code of HadoopTableOptions by BsoBird · Pull Request 
> #10623 · apache/iceberg 
> (github.com)|https://github.com/apache/iceberg/pull/10623]
> With this PR, we can implement atomic operations based on hadoopcatalog.
> But this PR is not accepted by the iceberg community.And it seems that the 
> iceberg community is trying to remove support for hadoopcatalog(only for 
> testing).
> Since hive itself supports a number of features based on the hadoop_catalog 
> table, can we merge this patch in hive?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28373) Iceberg: Refactor the code of HadoopTableOptions

2024-09-11 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated HIVE-28373:
---
Summary: Iceberg: Refactor the code of HadoopTableOptions  (was: Fix 
HadoopCatalog based table)

> Iceberg: Refactor the code of HadoopTableOptions
> 
>
> Key: HIVE-28373
> URL: https://issues.apache.org/jira/browse/HIVE-28373
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0
>Reporter: yongzhi.shao
>Assignee: yongzhi.shao
>Priority: Major
>  Labels: pull-request-available
>
> Since there are a lot of problems with hadoop_catalog, we submitted the 
> following PR to the iceberg community: 
> [core:Refactor the code of HadoopTableOptions by BsoBird · Pull Request 
> #10623 · apache/iceberg 
> (github.com)|https://github.com/apache/iceberg/pull/10623]
> With this PR, we can implement atomic operations based on hadoopcatalog.
> But this PR is not accepted by the iceberg community.And it seems that the 
> iceberg community is trying to remove support for hadoopcatalog(only for 
> testing).
> Since hive itself supports a number of features based on the hadoop_catalog 
> table, can we merge this patch in hive?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-28303) Capture build scans on ge.apache.org to benefit from deep build insights

2024-06-19 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa reassigned HIVE-28303:
--

Assignee: Simhadri Govindappa

> Capture build scans on ge.apache.org to benefit from deep build insights
> 
>
> Key: HIVE-28303
> URL: https://issues.apache.org/jira/browse/HIVE-28303
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Gasper Kojek
>Assignee: Simhadri Govindappa
>Priority: Minor
>  Labels: pull-request-available
>
> This improvement will enhance the functionality of the Hive build by 
> publishing build scans to [ge.apache.org|https://ge.apache.org/], hosted by 
> the Apache Software Foundation and run in partnership between the ASF and 
> Gradle. This Develocity instance has all features and extensions enabled and 
> is freely available for use by the Apache Hive project and all other Apache 
> projects.
> On this Develocity instance, Apache Hive will have access not only to all of 
> the published build scans but other aggregate data features such as:
>  * Dashboards to view all historical build scans, along with performance 
> trends over time
>  * Build failure analytics for enhanced investigation and diagnosis of build 
> failures
>  * Test failure analytics to better understand trends and causes around slow, 
> failing, and flaky tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-28303) Capture build scans on ge.apache.org to benefit from deep build insights

2024-06-19 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa reassigned HIVE-28303:
--

Assignee: Gasper Kojek  (was: Simhadri Govindappa)

> Capture build scans on ge.apache.org to benefit from deep build insights
> 
>
> Key: HIVE-28303
> URL: https://issues.apache.org/jira/browse/HIVE-28303
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Gasper Kojek
>Assignee: Gasper Kojek
>Priority: Minor
>  Labels: pull-request-available
>
> This improvement will enhance the functionality of the Hive build by 
> publishing build scans to [ge.apache.org|https://ge.apache.org/], hosted by 
> the Apache Software Foundation and run in partnership between the ASF and 
> Gradle. This Develocity instance has all features and extensions enabled and 
> is freely available for use by the Apache Hive project and all other Apache 
> projects.
> On this Develocity instance, Apache Hive will have access not only to all of 
> the published build scans but other aggregate data features such as:
>  * Dashboards to view all historical build scans, along with performance 
> trends over time
>  * Build failure analytics for enhanced investigation and diagnosis of build 
> failures
>  * Test failure analytics to better understand trends and causes around slow, 
> failing, and flaky tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28303) Capture build scans on ge.apache.org to benefit from deep build insights

2024-06-19 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa resolved HIVE-28303.

Fix Version/s: 4.1.0
   Resolution: Fixed

> Capture build scans on ge.apache.org to benefit from deep build insights
> 
>
> Key: HIVE-28303
> URL: https://issues.apache.org/jira/browse/HIVE-28303
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Gasper Kojek
>Assignee: Gasper Kojek
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> This improvement will enhance the functionality of the Hive build by 
> publishing build scans to [ge.apache.org|https://ge.apache.org/], hosted by 
> the Apache Software Foundation and run in partnership between the ASF and 
> Gradle. This Develocity instance has all features and extensions enabled and 
> is freely available for use by the Apache Hive project and all other Apache 
> projects.
> On this Develocity instance, Apache Hive will have access not only to all of 
> the published build scans but other aggregate data features such as:
>  * Dashboards to view all historical build scans, along with performance 
> trends over time
>  * Build failure analytics for enhanced investigation and diagnosis of build 
> failures
>  * Test failure analytics to better understand trends and causes around slow, 
> failing, and flaky tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28303) Capture build scans on ge.apache.org to benefit from deep build insights

2024-06-19 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856292#comment-17856292
 ] 

Simhadri Govindappa commented on HIVE-28303:


Change has been merged to master.

Thanks [~gkojek] for the PR. 

> Capture build scans on ge.apache.org to benefit from deep build insights
> 
>
> Key: HIVE-28303
> URL: https://issues.apache.org/jira/browse/HIVE-28303
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Gasper Kojek
>Assignee: Gasper Kojek
>Priority: Minor
>  Labels: pull-request-available
>
> This improvement will enhance the functionality of the Hive build by 
> publishing build scans to [ge.apache.org|https://ge.apache.org/], hosted by 
> the Apache Software Foundation and run in partnership between the ASF and 
> Gradle. This Develocity instance has all features and extensions enabled and 
> is freely available for use by the Apache Hive project and all other Apache 
> projects.
> On this Develocity instance, Apache Hive will have access not only to all of 
> the published build scans but other aggregate data features such as:
>  * Dashboards to view all historical build scans, along with performance 
> trends over time
>  * Build failure analytics for enhanced investigation and diagnosis of build 
> failures
>  * Test failure analytics to better understand trends and causes around slow, 
> failing, and flaky tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception

2024-05-16 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846837#comment-17846837
 ] 

Simhadri Govindappa commented on HIVE-28249:


Thanks, [~dkuzmenko] and  [~zabetak]  for the review and all the help :) 
Change is merged to master. 

 

It looks like the jodd authors have acknowledged it as a bug:   
[https://github.com/oblac/jodd-util/issues/21] .

 

 

> Parquet legacy timezone conversion converts march 1st to 29th feb and fails 
> with not a leap year exception
> --
>
> Key: HIVE-28249
> URL: https://issues.apache.org/jira/browse/HIVE-28249
> Project: Hive
>  Issue Type: Task
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>
> When handling legacy time stamp conversions in parquet,'February 29' year 
> '200' is an edge case.
> This is because, according to this: [https://www.lanl.gov/Caesar/node202.html]
> The Julian day for 200 CE/02/29 in the Julian calendar is different from the 
> Julian day in Gregorian Calendar .
> ||Date (BC/AD)||Date (CE)||Julian Day||Julian Day||
> |-|  -|(Julian Calendar)|(Gregorian Calendar)|
> |200 AD/02/28|200 CE/02/28|1794166|1794167|
> |200 AD/02/29|200 CE/02/29|1794167|1794168|
> |200 AD/03/01|200 CE/03/01|1794168|1794168|
> |300 AD/02/28|300 CE/02/28|1830691|1830691|
> |300 AD/02/29|300 CE/02/29|1830692|1830692|
> |300 AD/03/01|300 CE/03/01|1830693|1830692|
>  
>  * Because of this:
> {noformat}
> int julianDay = nt.getJulianDay(); {noformat}
> returns julian day 1794167 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java#L92]
>  * Later :
> {noformat}
> Timestamp result = Timestamp.valueOf(formatter.format(date)); {noformat}
> _{{{}formatter.format(date{}}})_ returns 29-02-200 as it seems to be using 
> julian calendar
> but _{{Timestamp.valueOf(29-02-200)}}_ seems to be using gregorian calendar 
> and fails with "not a leap year exception" for 29th Feb 200"
> [https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/type/TimestampTZUtil.java#L196]
> Since hive stores timestamp in UTC, when converting 200 CE/03/01 between 
> timezones, hive runs into an exception and fails with "not a leap year 
> exception" for 29th Feb 200 even if the actual record inserted was 200 
> CE/03/01 in Asia/Singapore timezone.
>  
> Fullstack trace:
> {noformat}
> java.lang.RuntimeException: java.io.IOException: 
> org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in 
> block -1 in file 
> file:/Users/simhadri.govindappa/Documents/apache/hive/itests/qtest/target/localfs/warehouse/test_sgt/sgt000
>     at 
> org.apache.hadoop.hive.ql.exec.FetchTask.executeInner(FetchTask.java:210)
>     at org.apache.hadoop.hive.ql.exec.FetchTask.execute(FetchTask.java:95)
>     at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:212)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149)
>     at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185)
>     at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:230)
>     at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356)
>     at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:732)
>     at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:702)
>     at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:116)
>     at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
>     at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>     at 
> org

[jira] [Resolved] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception

2024-05-16 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa resolved HIVE-28249.

Fix Version/s: 4.1.0
   Resolution: Fixed

> Parquet legacy timezone conversion converts march 1st to 29th feb and fails 
> with not a leap year exception
> --
>
> Key: HIVE-28249
> URL: https://issues.apache.org/jira/browse/HIVE-28249
> Project: Hive
>  Issue Type: Task
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> When handling legacy time stamp conversions in parquet,'February 29' year 
> '200' is an edge case.
> This is because, according to this: [https://www.lanl.gov/Caesar/node202.html]
> The Julian day for 200 CE/02/29 in the Julian calendar is different from the 
> Julian day in Gregorian Calendar .
> ||Date (BC/AD)||Date (CE)||Julian Day||Julian Day||
> |-|  -|(Julian Calendar)|(Gregorian Calendar)|
> |200 AD/02/28|200 CE/02/28|1794166|1794167|
> |200 AD/02/29|200 CE/02/29|1794167|1794168|
> |200 AD/03/01|200 CE/03/01|1794168|1794168|
> |300 AD/02/28|300 CE/02/28|1830691|1830691|
> |300 AD/02/29|300 CE/02/29|1830692|1830692|
> |300 AD/03/01|300 CE/03/01|1830693|1830692|
>  
>  * Because of this:
> {noformat}
> int julianDay = nt.getJulianDay(); {noformat}
> returns julian day 1794167 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java#L92]
>  * Later :
> {noformat}
> Timestamp result = Timestamp.valueOf(formatter.format(date)); {noformat}
> _{{{}formatter.format(date{}}})_ returns 29-02-200 as it seems to be using 
> julian calendar
> but _{{Timestamp.valueOf(29-02-200)}}_ seems to be using gregorian calendar 
> and fails with "not a leap year exception" for 29th Feb 200"
> [https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/type/TimestampTZUtil.java#L196]
> Since hive stores timestamp in UTC, when converting 200 CE/03/01 between 
> timezones, hive runs into an exception and fails with "not a leap year 
> exception" for 29th Feb 200 even if the actual record inserted was 200 
> CE/03/01 in Asia/Singapore timezone.
>  
> Fullstack trace:
> {noformat}
> java.lang.RuntimeException: java.io.IOException: 
> org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in 
> block -1 in file 
> file:/Users/simhadri.govindappa/Documents/apache/hive/itests/qtest/target/localfs/warehouse/test_sgt/sgt000
>     at 
> org.apache.hadoop.hive.ql.exec.FetchTask.executeInner(FetchTask.java:210)
>     at org.apache.hadoop.hive.ql.exec.FetchTask.execute(FetchTask.java:95)
>     at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:212)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149)
>     at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185)
>     at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:230)
>     at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356)
>     at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:732)
>     at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:702)
>     at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:116)
>     at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
>     at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>     at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>     at 
> org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
>     at org.junit.runners.Pare

[jira] [Comment Edited] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception

2024-05-10 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845291#comment-17845291
 ] 

Simhadri Govindappa edited comment on HIVE-28249 at 5/10/24 2:24 PM:
-

Hi Stamatis,

Thanks for the inputs. 

I have updated the Jira and the github PR with the stacktrace.  I have added 
more details in the PR description, will add it here as well.

{*}Steps to reproduce:{*}{*}{{*}}

I have provided a q file that recreates the issue in the PR as well

*Step 1:* In hive 2.1.1 or earlier version of 3.x with parquet 1.8 :
 # Create a table in Asia/Singapore Timezone:

{noformat}
create table default.test_sgt(currtime timestamp) stored as parquet;{noformat}
 # Insert record with date 0200-03-01

{noformat}
insert into default.test_sgt values ('0200-03-01 00:00:00'){noformat}
 

*Step 2:* After migrating the datafile for this table with BDR to hive 4 (with 
parquet version 1.10.x). We can recreate the error by running the following:

 
{noformat}
--! qt:timezone:Asia/Singapore
CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';

LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT;

SELECT * FROM TEST_SGT;{noformat}
 

 

*Does this problem affect older versions of Hive?*

Yes, it affects alpha and beta releases as well. 

*Does it appear when reading and writing of Parquet data happens with the same 
version?*

No, we see the issue only when there is a cross-version writing and reading.

 

*Does it require some specific properties to be set when reading/writing?*

No, but it does depend on legacy conversions

 

*Is it relevant only for 200 CE/02/29 or it affects other dates as well?*

It is relevant for other century years as well such as 200 and so on until 1582 
when gregorian calendar was used. Where there is an overlap.

The Julian calendar defines a leap year as once every four years. The Gregorian 
calendar modified the addition of leap days, such that a century year was only 
counted as a leap year if it was also divisible by 400.

So according to julian calender the year 200 or 300 is a leap year but they are 
not a leap year according to Gregorian calendar. That is why we are seeing this 
issue.

 

*Is this a regression caused by another ticket?*

Not sure yet, I am looking for it but haven't found the ticket that caused this 
regression.

 


was (Author: simhadri-g):
Hi Stamatis,

Thanks for the inputs. 

I have updated the Jira and the github PR with the stacktrace.  I have added 
more details in the PR description, will add it here as well.

{*}Steps to reproduce:{*}{*}{{*}}

I have provided a q file that recreates the issue in the PR as well

*Step 1:* In hive 2.1.1 or earlier version of 3.x with parquet 1.8 :
 # Create a table in Asia/Singapore Timezone:

{noformat}
create table default.test_sgt(currtime timestamp) stored as parquet;{noformat}
 # Insert record with date 0200-03-01

{noformat}
insert into default.test_sgt select '0200-03-01 00:00:00'{noformat}
 

*Step 2:* After migrating the datafile for this table with BDR to hive 4 (with 
parquet version 1.10.x). We can recreate the error by running the following:

 
{noformat}
--! qt:timezone:Asia/Singapore
CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';

LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT;

SELECT * FROM TEST_SGT;{noformat}
 

 

*Does this problem affect older versions of Hive?*

Yes, it affects alpha and beta releases as well. 

*Does it appear when reading and writing of Parquet data happens with the same 
version?*

No, we see the issue only when there is a cross-version writing and reading.

 

*Does it require some specific properties to be set when reading/writing?*

No, but it does depend on legacy conversions

 

*Is it relevant only for 200 CE/02/29 or it affects other dates as well?*

It is relevant for other century years as well such as 200 and so on until 1582 
when gregorian calendar was used. Where there is an overlap.

The Julian calendar defines a leap year as once every four years. The Gregorian 
calendar modified the addition of leap days, such that a century year was only 
counted as a leap year if it was also divisible by 400.

So according to julian calender the year 200 or 300 is a leap year but they are 
not a leap year according to Gregorian calendar. That is why we are seeing this 
issue.

 

*Is this a regression caused by another

[jira] [Comment Edited] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception

2024-05-10 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845291#comment-17845291
 ] 

Simhadri Govindappa edited comment on HIVE-28249 at 5/10/24 2:17 PM:
-

Hi Stamatis,

Thanks for the inputs. 

I have updated the Jira and the github PR with the stacktrace.  I have added 
more details in the PR description, will add it here as well.

{*}Steps to reproduce:{*}{*}{{*}}

I have provided a q file that recreates the issue in the PR as well

*Step 1:* In hive 2.1.1 or earlier version of 3.x with parquet 1.8 :
 # Create a table in Asia/Singapore Timezone:

{noformat}
create table default.test_sgt(currtime timestamp) stored as parquet;{noformat}
 # Insert record with date 0200-03-01

{noformat}
insert into default.test_sgt select '0200-03-01 00:00:00'{noformat}
 

*Step 2:* After migrating the datafile for this table with BDR to hive 4 (with 
parquet version 1.10.x). We can recreate the error by running the following:

 
{noformat}
--! qt:timezone:Asia/Singapore
CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';

LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT;

SELECT * FROM TEST_SGT;{noformat}
 

 

*Does this problem affect older versions of Hive?*

Yes, it affects alpha and beta releases as well. 

*Does it appear when reading and writing of Parquet data happens with the same 
version?*

No, we see the issue only when there is a cross-version writing and reading.

 

*Does it require some specific properties to be set when reading/writing?*

No, but it does depend on legacy conversions

 

*Is it relevant only for 200 CE/02/29 or it affects other dates as well?*

It is relevant for other century years as well such as 200 and so on until 1582 
when gregorian calendar was used. Where there is an overlap.

The Julian calendar defines a leap year as once every four years. The Gregorian 
calendar modified the addition of leap days, such that a century year was only 
counted as a leap year if it was also divisible by 400.

So according to julian calender the year 200 or 300 is a leap year but they are 
not a leap year according to Gregorian calendar. That is why we are seeing this 
issue.

 

*Is this a regression caused by another ticket?*

Not sure yet, I am looking for it but haven't found the ticket that caused this 
regression.

 


was (Author: simhadri-g):
Hi Stamatis,

Thanks for the inputs. 

I have updated the Jira and the github PR with the stacktrace.  I have added 
more details in the PR description, will add it here as well.

{*}Steps to reproduce:{*}{*}{{*}}

I have provided a q file that recreates the issue in the PR as well

*Step 1:* In hive 2.1.1 or earlier version of 3.x with parquet 1.8 :
 # Create a table in Asia/Singapore Timezone:

{noformat}
create table default.test_sgt(currtime timestamp) stored as parquet;{noformat}
 # Insert record with date 0200-03-01

{noformat}
insert into default.test_sgt select '0200-03-01 00:00:00'{noformat}
 

*Step 2:* After migrating the datafile for this table with BDR to hive 4 (with 
parquet version 1.10.x). We can recreate the error by running the following:

 
{noformat}
--! qt:timezone:Asia/Singapore
CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';

LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT;

SELECT * FROM TEST_SGT;{noformat}
 

 

*Does this problem affect older versions of Hive?*

Yes, it affects alpha and beta releases as well. 

*Does it appear when reading and writing of Parquet data happens with the same 
version?*

No, we see the issue only when there is a cross-version writing and reading.

 

*Does it require some specific properties to be set when reading/writing?*

No, but it does depend on legacy conversions

 

*Is it relevant only for 200 CE/02/29 or it affects other dates as well?*

It is relevant for other century years as well such as 200, 300, 500, 600 and 
so on until 1582 when gregorian calendar was used.

The Julian calendar defines a leap year as once every four years. The Gregorian 
calendar modified the addition of leap days, such that a century year was only 
counted as a leap year if it was also divisible by 400.

So according to julian calender the year 200 or 300 is a leap year but they are 
not a leap year according to Gregorian calendar. That is why we are seeing this 
issue.

 

*Is this a regression caused by another ticket?*

Not

[jira] [Comment Edited] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception

2024-05-10 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845291#comment-17845291
 ] 

Simhadri Govindappa edited comment on HIVE-28249 at 5/10/24 1:56 PM:
-

Hi Stamatis,

Thanks for the inputs. 

I have updated the Jira and the github PR with the stacktrace.  I have added 
more details in the PR description, will add it here as well.

{*}Steps to reproduce:{*}{*}{{*}}

I have provided a q file that recreates the issue in the PR as well

*Step 1:* In hive 2.1.1 or earlier version of 3.x with parquet 1.8 :
 # Create a table in Asia/Singapore Timezone:

{noformat}
create table default.test_sgt(currtime timestamp) stored as parquet;{noformat}
 # Insert record with date 0200-03-01

{noformat}
insert into default.test_sgt select '0200-03-01 00:00:00'{noformat}
 

*Step 2:* After migrating the datafile for this table with BDR to hive 4 (with 
parquet version 1.10.x). We can recreate the error by running the following:

 
{noformat}
--! qt:timezone:Asia/Singapore
CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';

LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT;

SELECT * FROM TEST_SGT;{noformat}
 

 

*Does this problem affect older versions of Hive?*

Yes, it affects alpha and beta releases as well. 

*Does it appear when reading and writing of Parquet data happens with the same 
version?*

No, we see the issue only when there is a cross-version writing and reading.

 

*Does it require some specific properties to be set when reading/writing?*

No, but it does depend on legacy conversions

 

*Is it relevant only for 200 CE/02/29 or it affects other dates as well?*

It is relevant for other century years as well such as 200, 300, 500, 600 and 
so on until 1582 when gregorian calendar was used.

The Julian calendar defines a leap year as once every four years. The Gregorian 
calendar modified the addition of leap days, such that a century year was only 
counted as a leap year if it was also divisible by 400.

So according to julian calender the year 200 or 300 is a leap year but they are 
not a leap year according to Gregorian calendar. That is why we are seeing this 
issue.

 

*Is this a regression caused by another ticket?*

Not sure yet, I am looking for it but haven't found the ticket that caused this 
regression.

 


was (Author: simhadri-g):
Hi Stamatis,

Thanks for the inputs. 

I have updated the Jira and the github PR with the stacktrace.  I have added 
more details in the PR description, will add it here as well.

{*}Steps to reproduce:{*}{*}{{*}}

I have provided a q file that recreates the issue in the PR as well

*Step 1:* In hive 2.1.1 or earlier version of 3.x with parquet 1.8 :
 # Create a table in Asia/Singapore Timezone:

{noformat}
create table default.test_sgt(currtime timestamp) stored as parquet;{noformat}
 # Insert record with date 0200-03-01

{noformat}
insert into default.test_vj select '0200-03-01 00:00:00'{noformat}
 

*Step 2:* After migrating the datafile for this table with BDR to hive 4 (with 
parquet version 1.10.x). We can recreate the error by running the following:

 
{noformat}
--! qt:timezone:Asia/Singapore
CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';

LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT;

SELECT * FROM TEST_SGT;{noformat}
 

 

*Does this problem affect older versions of Hive?*

Yes, it affects alpha and beta releases as well. 

*Does it appear when reading and writing of Parquet data happens with the same 
version?*

No, we see the issue only when there is a cross-version writing and reading.

 

*Does it require some specific properties to be set when reading/writing?*

No, but it does depend on legacy conversions

 

*Is it relevant only for 200 CE/02/29 or it affects other dates as well?*

It is relevant for other century years as well such as 200, 300, 500, 600 and 
so on until 1582 when gregorian calendar was used.

The Julian calendar defines a leap year as once every four years. The Gregorian 
calendar modified the addition of leap days, such that a century year was only 
counted as a leap year if it was also divisible by 400.

So according to julian calender the year 200 or 300 is a leap year but they are 
not a leap year according to Gregorian calendar. That is why we are seeing this 
issue.

 

*Is this a regression caused by another ticket?*

Not sure yet, I 

[jira] [Comment Edited] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception

2024-05-10 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845291#comment-17845291
 ] 

Simhadri Govindappa edited comment on HIVE-28249 at 5/10/24 11:03 AM:
--

Hi Stamatis,

Thanks for the inputs. 

I have updated the Jira and the github PR with the stacktrace.  I have added 
more details in the PR description, will add it here as well.

{*}Steps to reproduce:{*}{*}{{*}}

I have provided a q file that recreates the issue in the PR as well

*Step 1:* In hive 2.1.1 or earlier version of 3.x with parquet 1.8 :
 # Create a table in Asia/Singapore Timezone:

{noformat}
create table default.test_sgt(currtime timestamp) stored as parquet;{noformat}
 # Insert record with date 0200-03-01

{noformat}
insert into default.test_vj select '0200-03-01 00:00:00'{noformat}
 

*Step 2:* After migrating the datafile for this table with BDR to hive 4 (with 
parquet version 1.10.x). We can recreate the error by running the following:

 
{noformat}
--! qt:timezone:Asia/Singapore
CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';

LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT;

SELECT * FROM TEST_SGT;{noformat}
 

 

*Does this problem affect older versions of Hive?*

Yes, it affects alpha and beta releases as well. 

*Does it appear when reading and writing of Parquet data happens with the same 
version?*

No, we see the issue only when there is a cross-version writing and reading.

 

*Does it require some specific properties to be set when reading/writing?*

No, but it does depend on legacy conversions

 

*Is it relevant only for 200 CE/02/29 or it affects other dates as well?*

It is relevant for other century years as well such as 200, 300, 500, 600 and 
so on until 1582 when gregorian calendar was used.

The Julian calendar defines a leap year as once every four years. The Gregorian 
calendar modified the addition of leap days, such that a century year was only 
counted as a leap year if it was also divisible by 400.

So according to julian calender the year 200 or 300 is a leap year but they are 
not a leap year according to Gregorian calendar. That is why we are seeing this 
issue.

 

*Is this a regression caused by another ticket?*

Not sure yet, I am looking for it but haven't found the ticket that caused this 
regression.

 


was (Author: simhadri-g):
Hi Stamatis,

Thanks for the inputs. 

I have updated the Jira and the github PR with the stacktrace.  I have added 
more details in the PR description, will add it here as well.

{*}Steps to reproduce:{*}{*}{{*}}

I have provided a q file that recreates the issue in the PR as well

*Step 1:* In hive 2.1.1 or 3.x with parquet 1.8 :
 # Create a table in Asia/Singapore Timezone:

{noformat}
create table default.test_sgt(currtime timestamp) stored as parquet;{noformat}
 # Insert record with date 0200-03-01

{noformat}
insert into default.test_vj select '0200-03-01 00:00:00'{noformat}
 

*Step 2:* After migrating the datafile for this table with BDR to hive 4 (with 
parquet version 1.10.x). We can recreate the error by running the following:

 
{noformat}
--! qt:timezone:Asia/Singapore
CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';

LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT;

SELECT * FROM TEST_SGT;{noformat}
 

 

*Does this problem affect older versions of Hive?*

Yes, it affects alpha and beta releases as well. 

*Does it appear when reading and writing of Parquet data happens with the same 
version?*

No, we see the issue only when there is a cross-version writing and reading.

 

*Does it require some specific properties to be set when reading/writing?*

No, but it does depend on legacy conversions

 

*Is it relevant only for 200 CE/02/29 or it affects other dates as well?*

It is relevant for other century years as well such as 200, 300, 500, 600 and 
so on until 1582 when gregorian calendar was used.

The Julian calendar defines a leap year as once every four years. The Gregorian 
calendar modified the addition of leap days, such that a century year was only 
counted as a leap year if it was also divisible by 400.

So according to julian calender the year 200 or 300 is a leap year but they are 
not a leap year according to Gregorian calendar. That is why we are seeing this 
issue.

 

*Is this a regression caused by another ticket?*

Not sure yet, I am looking for it 

[jira] [Comment Edited] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception

2024-05-10 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845291#comment-17845291
 ] 

Simhadri Govindappa edited comment on HIVE-28249 at 5/10/24 10:47 AM:
--

Hi Stamatis,

Thanks for the inputs. 

I have updated the Jira and the github PR with the stacktrace.  I have added 
more details in the PR description, will add it here as well.

{*}Steps to reproduce:{*}{*}{{*}}

I have provided a q file that recreates the issue in the PR as well

*Step 1:* In hive 2.1.1 or 3.x with parquet 1.8 :
 # Create a table in Asia/Singapore Timezone:

{noformat}
create table default.test_sgt(currtime timestamp) stored as parquet;{noformat}
 # Insert record with date 0200-03-01

{noformat}
insert into default.test_vj select '0200-03-01 00:00:00'{noformat}
 

*Step 2:* After migrating the datafile for this table with BDR to hive 4 (with 
parquet version 1.10.x). We can recreate the error by running the following:

 
{noformat}
--! qt:timezone:Asia/Singapore
CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';

LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT;

SELECT * FROM TEST_SGT;{noformat}
 

 

*Does this problem affect older versions of Hive?*

Yes, it affects alpha and beta releases as well. 

*Does it appear when reading and writing of Parquet data happens with the same 
version?*

No, we see the issue only when there is a cross-version writing and reading.

 

*Does it require some specific properties to be set when reading/writing?*

No, but it does depend on legacy conversions

 

*Is it relevant only for 200 CE/02/29 or it affects other dates as well?*

It is relevant for other century years as well such as 200, 300, 500, 600 and 
so on until 1582 when gregorian calendar was used.

The Julian calendar defines a leap year as once every four years. The Gregorian 
calendar modified the addition of leap days, such that a century year was only 
counted as a leap year if it was also divisible by 400.

So according to julian calender the year 200 or 300 is a leap year but they are 
not a leap year according to Gregorian calendar. That is why we are seeing this 
issue.

 

*Is this a regression caused by another ticket?*

Not sure yet, I am looking for it but haven't found the ticket that caused this 
regression.

 


was (Author: simhadri-g):
Hi Stamatis,

Thanks for the inputs. 

I have updated the Jira and the github PR with the stacktrace.  I have added 
more details in the PR description, will add it here as well.

{*}Steps to reproduce:{*}{*}{{*}}

I have provided a q file that recreates the issue in the PR as well

*Step 1:* In hive 2.1.1 or 3.x with parquet 1.8 :
 # Create a table in Asia/Singapore Timezone:

{noformat}
create table default.test_sgt(currtime timestamp) stored as parquet;{noformat}
 # Insert record with date 0200-03-01

{noformat}
insert into default.test_vj select '0200-03-01 00:00:00'{noformat}
 

*Step 2:* After migrating the datafile for this table with BDR to hive 4 (with 
parquet version 1.10.x). We can recreate the error by running the following:

 
{noformat}
--! qt:timezone:Asia/Singapore
CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';

LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT;

SELECT * FROM TEST_SGT;{noformat}
 

 

*Does this problem affect older versions of Hive?*

Yes, it affects alpha and beta releases as well. 

*Does it appear when reading and writing of Parquet data happens with the same 
version?*

No, we see the issue only when there is a cross-version writing and reading.

 

*Does it require some specific properties to be set when reading/writing?*

No, but it does depend on legacy conversions

 

*Is it relevant only for 200 CE/02/29 or it affects other dates as well?*

Other century years such as 200, 300, 500, 600 and so on until 1582 when 
gregorian calendar was used.

The Julian calendar defines a leap year as once every four years. The Gregorian 
calendar modified the addition of leap days, such that a century year was only 
counted as a leap year if it was also divisible by 400.

So according to julian calender the year 200 or 300 is a leap year but they are 
not a leap year according to Gregorian calendar. That is why we are seeing this 
issue.

 

*Is this a regression caused by another ticket?*

Not sure yet, I am looking for it but haven't found the ticket that caused this 

[jira] [Comment Edited] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception

2024-05-10 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845291#comment-17845291
 ] 

Simhadri Govindappa edited comment on HIVE-28249 at 5/10/24 10:46 AM:
--

Hi Stamatis,

Thanks for the inputs. 

I have updated the Jira and the github PR with the stacktrace.  I have added 
more details in the PR description, will add it here as well.

{*}Steps to reproduce:{*}{*}{{*}}

I have provided a q file that recreates the issue in the PR as well

*Step 1:* In hive 2.1.1 or 3.x with parquet 1.8 :
 # Create a table in Asia/Singapore Timezone:

{noformat}
create table default.test_sgt(currtime timestamp) stored as parquet;{noformat}
 # Insert record with date 0200-03-01

{noformat}
insert into default.test_vj select '0200-03-01 00:00:00'{noformat}
 

*Step 2:* After migrating the datafile for this table with BDR to hive 4 (with 
parquet version 1.10.x). We can recreate the error by running the following:

 
{noformat}
--! qt:timezone:Asia/Singapore
CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';

LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT;

SELECT * FROM TEST_SGT;{noformat}
 

 

*Does this problem affect older versions of Hive?*

Yes, it affects alpha and beta releases as well. 

*Does it appear when reading and writing of Parquet data happens with the same 
version?*

No, we see the issue only when there is a cross-version writing and reading.

 

*Does it require some specific properties to be set when reading/writing?*

No, but it does depend on legacy conversions

 

*Is it relevant only for 200 CE/02/29 or it affects other dates as well?*

Other century years such as 200, 300, 500, 600 and so on until 1582 when 
gregorian calendar was used.

The Julian calendar defines a leap year as once every four years. The Gregorian 
calendar modified the addition of leap days, such that a century year was only 
counted as a leap year if it was also divisible by 400.

So according to julian calender the year 200 or 300 is a leap year but they are 
not a leap year according to Gregorian calendar. That is why we are seeing this 
issue.

 

*Is this a regression caused by another ticket?*

Not sure yet, I am looking for it but haven't found the ticket that caused this 
regression.

 


was (Author: simhadri-g):
Hi Stamatis,

Thanks for the inputs. 

I have updated the Jira and the github PR with the stacktrace.  I have added 
more details in the PR description, will add it here as well.

{*}Steps to reproduce:{*}{*}{{*}}

I have provided a q file that recreates the issue in the PR as well

*Step 1:* In hive 2.1.1 or 3.x with parquet 1.8 :
 # Create a table in Asia/Singapore Timezone:

{noformat}
create table default.test_sgt(currtime timestamp) stored as parquet;{noformat}
 # Insert record with date 0200-03-01

{noformat}
insert into default.test_vj select '0200-03-01 00:00:00'{noformat}
*Step 2:* After migrating the datafile for this table with BDR to hive 4 with 
parquet version 1.10.x, we recreate the error by running the following:

 
{noformat}
--! qt:timezone:Asia/Singapore
CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';

LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT;

SELECT * FROM TEST_SGT;{noformat}
 

 

*Does this problem affect older versions of Hive?*

Yes, it affects alpha and beta releases as well. 

*Does it appear when reading and writing of Parquet data happens with the same 
version?*

No, we see the issue only when there is a cross-version writing and reading.

 

*Does it require some specific properties to be set when reading/writing?*

No, but it does depend on legacy conversions

 

*Is it relevant only for 200 CE/02/29 or it affects other dates as well?*

Other century years such as 200, 300, 500, 600 and so on until 1582 when 
gregorian calendar was used.

The Julian calendar defines a leap year as once every four years. The Gregorian 
calendar modified the addition of leap days, such that a century year was only 
counted as a leap year if it was also divisible by 400.

So according to julian calender the year 200 or 300 is a leap year but they are 
not a leap year according to Gregorian calendar. That is why we are seeing this 
issue.

 

*Is this a regression caused by another ticket?*

Not sure yet, I am looking for it but haven't found the ticket that caused this 
regression.

 

> Parquet legacy ti

[jira] [Comment Edited] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception

2024-05-10 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845291#comment-17845291
 ] 

Simhadri Govindappa edited comment on HIVE-28249 at 5/10/24 10:44 AM:
--

Hi Stamatis,

Thanks for the inputs. 

I have updated the Jira and the github PR with the stacktrace.  I have added 
more details in the PR description, will add it here as well.

{*}Steps to reproduce:{*}{*}{{*}}

I have provided a q file that recreates the issue in the PR as well

*Step 1:* In hive 2.1.1 or 3.x with parquet 1.8 :
 # Create a table in Asia/Singapore Timezone:

{noformat}
create table default.test_sgt(currtime timestamp) stored as parquet;{noformat}
 # Insert record with date 0200-03-01

{noformat}
insert into default.test_vj select '0200-03-01 00:00:00'{noformat}
*Step 2:* After migrating the datafile for this table with BDR to hive 4 with 
parquet version 1.10.x, we recreate the error by running the following:

 
{noformat}
--! qt:timezone:Asia/Singapore
CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';

LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT;

SELECT * FROM TEST_SGT;{noformat}
 

 

*Does this problem affect older versions of Hive?*

Yes, it affects alpha and beta releases as well. 

*Does it appear when reading and writing of Parquet data happens with the same 
version?*

No, we see the issue only when there is a cross-version writing and reading.

 

*Does it require some specific properties to be set when reading/writing?*

No, but it does depend on legacy conversions

 

*Is it relevant only for 200 CE/02/29 or it affects other dates as well?*

Other century years such as 200, 300, 500, 600 and so on until 1582 when 
gregorian calendar was used.

The Julian calendar defines a leap year as once every four years. The Gregorian 
calendar modified the addition of leap days, such that a century year was only 
counted as a leap year if it was also divisible by 400.

So according to julian calender the year 200 or 300 is a leap year but they are 
not a leap year according to Gregorian calendar. That is why we are seeing this 
issue.

 

*Is this a regression caused by another ticket?*

Not sure yet, I am looking for it but haven't found the ticket that caused this 
regression.

 


was (Author: simhadri-g):
Hi Stamatis,

Thanks for the inputs. 

I have updated the Jira and the github PR with the stacktrace.  I have added 
more details in the PR description, will add it here as well.

{*}Steps to reproduce:{*}{*}{*}

I have provided a q file that recreates the issue in the PR as well

*Step 1:* In hive 2.1.1 or 3.x with parquet 1.8 :
 # Create a table in Asia/Singapore Timezone:

{noformat}
create table default.test_sgt(currtime timestamp) stored as parquet;{noformat}

 # Insert record with date 0200-03-01

{noformat}
insert into default.test_vj select '0200-03-01 00:00:00'{noformat}

*Step 2:* After migrating the datafile for this table with BDR to hive 4 with 
parquet version 1.10.x, we recreate the error by running the following:

 
{noformat}
--! qt:timezone:Asia/Singapore
CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';

LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT;

SELECT * FROM TEST_SGT;{noformat}
 

 

*Does this problem affect older versions of Hive?*

Yes, it affects alpha and beta releases as well. 

*Does it appear when reading and writing of Parquet data happens with the same 
version?*

No, we see the issue only when there is a cross-version writing and reading.

 

*Does it require some specific properties to be set when reading/writing?*

No, but it does depend on legacy conversions

 

*Is it relevant only for 200 CE/02/29 or it affects other dates as well?*

Other century years such as 200, 300, 500, 600 and so on until 1582 when 
gregorian calendar was used.

The Julian calendar defines a leap year as once every four years. The Gregorian 
calendar modified the addition of leap days, such that a century year was only 
counted as a leap year if it was also divisible by 400.

So according to julian calender the year 200 or 300 is a leap year but they are 
not a leap year according to Gregorian calendar. That is why we are seeing this 
issue.

 

*Is this a regression caused by another ticket?*

Not sure yet, I wasn't able to find the ticket yet. 

 


{{}}

> Parquet legacy timezone conversion converts march 1st to 29

[jira] [Updated] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception

2024-05-10 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated HIVE-28249:
---
Description: 
When handling legacy time stamp conversions in parquet,'February 29' year '200' 
is an edge case.
This is because, according to this: [https://www.lanl.gov/Caesar/node202.html]
The Julian day for 200 CE/02/29 in the Julian calendar is different from the 
Julian day in Gregorian Calendar .
||Date (BC/AD)||Date (CE)||Julian Day||Julian Day||
|-|  -|(Julian Calendar)|(Gregorian Calendar)|
|200 AD/02/28|200 CE/02/28|1794166|1794167|
|200 AD/02/29|200 CE/02/29|1794167|1794168|
|200 AD/03/01|200 CE/03/01|1794168|1794168|
|300 AD/02/28|300 CE/02/28|1830691|1830691|
|300 AD/02/29|300 CE/02/29|1830692|1830692|
|300 AD/03/01|300 CE/03/01|1830693|1830692|

 
 * Because of this:

{noformat}
int julianDay = nt.getJulianDay(); {noformat}
returns julian day 1794167 
[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java#L92]
 * Later :

{noformat}
Timestamp result = Timestamp.valueOf(formatter.format(date)); {noformat}
_{{{}formatter.format(date{}}})_ returns 29-02-200 as it seems to be using 
julian calendar
but _{{Timestamp.valueOf(29-02-200)}}_ seems to be using gregorian calendar and 
fails with "not a leap year exception" for 29th Feb 200"
[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/type/TimestampTZUtil.java#L196]

Since hive stores timestamp in UTC, when converting 200 CE/03/01 between 
timezones, hive runs into an exception and fails with "not a leap year 
exception" for 29th Feb 200 even if the actual record inserted was 200 CE/03/01 
in Asia/Singapore timezone.

 

Fullstack trace:
{noformat}
java.lang.RuntimeException: java.io.IOException: 
org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in 
block -1 in file 
file:/Users/simhadri.govindappa/Documents/apache/hive/itests/qtest/target/localfs/warehouse/test_sgt/sgt000
    at org.apache.hadoop.hive.ql.exec.FetchTask.executeInner(FetchTask.java:210)
    at org.apache.hadoop.hive.ql.exec.FetchTask.execute(FetchTask.java:95)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:212)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149)
    at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185)
    at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:230)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356)
    at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:732)
    at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:702)
    at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:116)
    at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
    at 
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
    at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
    at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at 
org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
    at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
    at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
    at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
    at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
    at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
    at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
    at org.junit.runners.ParentR

[jira] [Commented] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception

2024-05-10 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845291#comment-17845291
 ] 

Simhadri Govindappa commented on HIVE-28249:


Hi Stamatis,

Thanks for the inputs. 

I have updated the Jira and the github PR with the stacktrace.  I have added 
more details in the PR description, will add it here as well.

{*}Steps to reproduce:{*}{*}{*}

I have provided a q file that recreates the issue in the PR as well

*Step 1:* In hive 2.1.1 or 3.x with parquet 1.8 :
 # Create a table in Asia/Singapore Timezone:

{noformat}
create table default.test_sgt(currtime timestamp) stored as parquet;{noformat}

 # Insert record with date 0200-03-01

{noformat}
insert into default.test_vj select '0200-03-01 00:00:00'{noformat}

*Step 2:* After migrating the datafile for this table with BDR to hive 4 with 
parquet version 1.10.x, we recreate the error by running the following:

 
{noformat}
--! qt:timezone:Asia/Singapore
CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';

LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT;

SELECT * FROM TEST_SGT;{noformat}
 

 

*Does this problem affect older versions of Hive?*

Yes, it affects alpha and beta releases as well. 

*Does it appear when reading and writing of Parquet data happens with the same 
version?*

No, we see the issue only when there is a cross-version writing and reading.

 

*Does it require some specific properties to be set when reading/writing?*

No, but it does depend on legacy conversions

 

*Is it relevant only for 200 CE/02/29 or it affects other dates as well?*

Other century years such as 200, 300, 500, 600 and so on until 1582 when 
gregorian calendar was used.

The Julian calendar defines a leap year as once every four years. The Gregorian 
calendar modified the addition of leap days, such that a century year was only 
counted as a leap year if it was also divisible by 400.

So according to julian calender the year 200 or 300 is a leap year but they are 
not a leap year according to Gregorian calendar. That is why we are seeing this 
issue.

 

*Is this a regression caused by another ticket?*

Not sure yet, I wasn't able to find the ticket yet. 

 


{{}}

> Parquet legacy timezone conversion converts march 1st to 29th feb and fails 
> with not a leap year exception
> --
>
> Key: HIVE-28249
> URL: https://issues.apache.org/jira/browse/HIVE-28249
> Project: Hive
>  Issue Type: Task
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>
> When handling legacy time stamp conversions in parquet,'February 29' year 
> '200' is an edge case.
> This is because, according to this: [https://www.lanl.gov/Caesar/node202.html]
> The Julian day for 200 CE/02/29 in the Julian calendar is different from the 
> Julian day in Gregorian Calendar .
> ||Date (BC/AD)||Date (CE)||Julian Day||Julian Day||
> |-|  -|(Julian Calendar)|(Gregorian Calendar)|
> |200 AD/02/28|200 CE/02/28|1794166|1794167|
> |200 AD/02/29|200 CE/02/29|1794167|1794168|
> |200 AD/03/01|200 CE/03/01|1794168|1794168|
> |300 AD/02/28|300 CE/02/28|1830691|1830691|
> |300 AD/02/29|300 CE/02/29|1830692|1830692|
> |300 AD/03/01|300 CE/03/01|1830693|1830692|
>  
>  * Because of this:
> {noformat}
> int julianDay = nt.getJulianDay(); {noformat}
> returns julian day 1794167 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java#L92]
>  * Later :
> {noformat}
> Timestamp result = Timestamp.valueOf(formatter.format(date)); {noformat}
> _{{{}formatter.format(date{}}})_ returns 29-02-200 as it seems to be using 
> julian calendar
> but _{{Timestamp.valueOf(29-02-200)}}_ seems to be using gregorian calendar 
> and fails with "not a leap year exception" for 29th Feb 200"
> [https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/type/TimestampTZUtil.java#L196]
> Since hive stores timestamp in UTC, when converting 200 CE/03/01 between 
> timezones, hive runs into an exception and fails with "not a leap year 
> exception" for 29th Feb 200 even if the actual record inserted was 200 
> CE/03/01 in Asia/Singapore timezone.
>  
> Fullstack trace:
> {noformat}
> java.lang.RuntimeException: java.io.IOException: 
> org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in 
> block -1 in file 
> file:/Users/simhadri.govindappa/Documents/apache/hive/itests/qtes

[jira] [Updated] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception

2024-05-10 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated HIVE-28249:
---
Description: 
When handling legacy time stamp conversions in parquet,'February 29' year '200' 
is an edge case.
This is because, according to this: [https://www.lanl.gov/Caesar/node202.html]
The Julian day for 200 CE/02/29 in the Julian calendar is different from the 
Julian day in Gregorian Calendar .
||Date (BC/AD)||Date (CE)||Julian Day||Julian Day||
|-|  -|(Julian Calendar)|(Gregorian Calendar)|
|200 AD/02/28|200 CE/02/28|1794166|1794167|
|200 AD/02/29|200 CE/02/29|1794167|1794168|
|200 AD/03/01|200 CE/03/01|1794168|1794168|
|300 AD/02/28|300 CE/02/28|1830691|1830691|
|300 AD/02/29|300 CE/02/29|1830692|1830692|
|300 AD/03/01|300 CE/03/01|1830693|1830692|

Because of this:

{{}}
{noformat}
int julianDay = nt.getJulianDay(); {noformat}
returns julian day 1794167 
[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java#L92]

Later :

{{}}
{noformat}
Timestamp result = Timestamp.valueOf(formatter.format(date)); {noformat}
{{{}{}}}{*}_{{{}formatter.format(date{}}})_{*} returns 29-02-200 as it seems to 
be using julian calendar
but _{{Timestamp.valueOf(29-02-200)}}_ seems to be using gregorian calendar and 
fails with "not a leap year exception" for 29th Feb 200"
[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/type/TimestampTZUtil.java#L196]

Since hive stores timestamp in UTC, when converting 200 CE/03/01 between 
timezones, hive runs into an exception and fails with "not a leap year 
exception" for 29th Feb 200 even if the actual record inserted was 200 CE/03/01 
in Asia/Singapore timezone.

 

Fullstack trace:
{noformat}
java.lang.RuntimeException: java.io.IOException: 
org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in 
block -1 in file 
file:/Users/simhadri.govindappa/Documents/apache/hive/itests/qtest/target/localfs/warehouse/test_sgt/sgt000
    at org.apache.hadoop.hive.ql.exec.FetchTask.executeInner(FetchTask.java:210)
    at org.apache.hadoop.hive.ql.exec.FetchTask.execute(FetchTask.java:95)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:212)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149)
    at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185)
    at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:230)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356)
    at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:732)
    at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:702)
    at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:116)
    at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
    at 
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
    at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
    at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at 
org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
    at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
    at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
    at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
    at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
    at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
    at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
    at org.juni

[jira] [Updated] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception

2024-05-10 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated HIVE-28249:
---
Description: 
When handling legacy timezone conversions in parquet,  'February 29' year '200' 
is an edge case. 

This is because, according to this: [https://www.lanl.gov/Caesar/node202.html]

The Julian day for 200 CE/02/29 in the Julian calendar is different from the 
Julian day in Gregorian Calendar .
|Date (BC/AD)|Date (CE)|Julian Day|Julian Day|
| | |(Julian Calendar)|(Gregorian Calendar)|
|200 AD/02/28|200 CE/02/28|1794166|1794167|
|200 AD/02/29|200 CE/02/29|1794167|1794168|
|200 AD/03/01|200 CE/03/01|1794168|1794168|

As a result since hive stores timestamp in UTC, when converting 200 CE/03/01 
between timezones, hive runs into an exception and fails with "not a leap year 
exception" for 29th Feb 200 even if the actual record inserted was 200 CE/03/01 
in Asia/Singapore timezone.

 

 

Fullstack trace:
{noformat}
java.lang.RuntimeException: java.io.IOException: 
org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in 
block -1 in file 
file:/Users/simhadri.govindappa/Documents/apache/hive/itests/qtest/target/localfs/warehouse/test_sgt/sgt000
    at org.apache.hadoop.hive.ql.exec.FetchTask.executeInner(FetchTask.java:210)
    at org.apache.hadoop.hive.ql.exec.FetchTask.execute(FetchTask.java:95)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:212)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149)
    at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185)
    at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:230)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356)
    at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:732)
    at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:702)
    at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:116)
    at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
    at 
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
    at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
    at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at 
org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
    at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
    at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
    at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
    at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
    at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
    at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
    at org.junit.runners.Suite.runChild(Suite.java:128)
    at org.junit.runners.Suite.runChild(Suite.java:27)
    at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
    at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
    at 
org.apache.hadoop.hive.cli.control.CliAdapter$1$1.evaluate(CliAdapter.java:95)
    at org.junit.rules.RunRules.evaluate(RunRules.java:20)
    at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
    at 
org.apache.maven

[jira] [Updated] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception

2024-05-07 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated HIVE-28249:
---
Description: 
When handling legacy timezone conversions in parquet,  'February 29' year '200' 
is an edge case. 

This is because, according to this: [https://www.lanl.gov/Caesar/node202.html]

The Julian day for 200 CE/02/29 in the Julian calendar is different from the 
Julian day in Gregorian Calendar .
|Date (BC/AD)|Date (CE)|Julian Day|Julian Day|
| | |(Julian Calendar)|(Gregorian Calendar)|
|200 AD/02/28|200 CE/02/28|1794166|1794167|
|200 AD/02/29|200 CE/02/29|1794167|1794168|
|200 AD/03/01|200 CE/03/01|1794168|1794168|

As a result since hive stores timestamp in UTC, when converting 200 CE/03/01 
between timezones, hive runs into an exception and fails with "not a leap year 
exception" for 29th Feb 200 even if the actual record inserted was 200 CE/03/01 
in Asia/Singapore timezone.

> Parquet legacy timezone conversion converts march 1st to 29th feb and fails 
> with not a leap year exception
> --
>
> Key: HIVE-28249
> URL: https://issues.apache.org/jira/browse/HIVE-28249
> Project: Hive
>  Issue Type: Task
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>
> When handling legacy timezone conversions in parquet,  'February 29' year 
> '200' is an edge case. 
> This is because, according to this: [https://www.lanl.gov/Caesar/node202.html]
> The Julian day for 200 CE/02/29 in the Julian calendar is different from the 
> Julian day in Gregorian Calendar .
> |Date (BC/AD)|Date (CE)|Julian Day|Julian Day|
> | | |(Julian Calendar)|(Gregorian Calendar)|
> |200 AD/02/28|200 CE/02/28|1794166|1794167|
> |200 AD/02/29|200 CE/02/29|1794167|1794168|
> |200 AD/03/01|200 CE/03/01|1794168|1794168|
> As a result since hive stores timestamp in UTC, when converting 200 CE/03/01 
> between timezones, hive runs into an exception and fails with "not a leap 
> year exception" for 29th Feb 200 even if the actual record inserted was 200 
> CE/03/01 in Asia/Singapore timezone.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception

2024-05-07 Thread Simhadri Govindappa (Jira)
Simhadri Govindappa created HIVE-28249:
--

 Summary: Parquet legacy timezone conversion converts march 1st to 
29th feb and fails with not a leap year exception
 Key: HIVE-28249
 URL: https://issues.apache.org/jira/browse/HIVE-28249
 Project: Hive
  Issue Type: Task
Reporter: Simhadri Govindappa
Assignee: Simhadri Govindappa






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28211) Restore hive-exec-core jar

2024-04-24 Thread Simhadri Govindappa (Jira)
Simhadri Govindappa created HIVE-28211:
--

 Summary: Restore hive-exec-core jar
 Key: HIVE-28211
 URL: https://issues.apache.org/jira/browse/HIVE-28211
 Project: Hive
  Issue Type: Task
Reporter: Simhadri Govindappa
Assignee: Simhadri Govindappa


The hive-exec-core jar is used by spark, oozie, hudi and many other pojects. 
Removal of the hive-exec-core jar has caused the following issues.

Spark : [https://lists.apache.org/list?d...@hive.apache.org:lte=1M:joda]
Oozie: [https://lists.apache.org/thread/yld75ltf9y8d9q3cow3xqlg0fqyj6mkg]
Hudi: [apache/hudi#8147|https://github.com/apache/hudi/issues/8147]

Until the we shade & relocate dependencies in hive-exec, we should restore the 
hive-exec core jar .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28153) Flaky test TestConflictingDataFiles.testMultiFiltersUpdate

2024-04-15 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa resolved HIVE-28153.

Fix Version/s: 4.1.0
   Resolution: Fixed

> Flaky test TestConflictingDataFiles.testMultiFiltersUpdate
> --
>
> Key: HIVE-28153
> URL: https://issues.apache.org/jira/browse/HIVE-28153
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Reporter: Butao Zhang
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> This test has been failing a lot lately, such as 
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5063/13/tests/]
>  
> And the flaky test shows this test is unstable:
> [http://ci.hive.apache.org/job/hive-flaky-check/831/testReport/]
> {code:java}
> 10:29:21  [INFO]  T E S T S
> 10:29:21  [INFO] ---
> 10:29:21  [INFO] Running org.apache.iceberg.mr.hive.TestConflictingDataFiles
> 10:36:13  [ERROR] Tests run: 60, Failures: 1, Errors: 0, Skipped: 24, Time 
> elapsed: 399.12 s <<< FAILURE! - in 
> org.apache.iceberg.mr.hive.TestConflictingDataFiles
> 10:36:13  [ERROR] 
> org.apache.iceberg.mr.hive.TestConflictingDataFiles.testMultiFiltersUpdate[fileFormat=PARQUET,
>  engine=tez, catalog=HIVE_CATALOG, isVectorized=false, formatVersion=1]  Time 
> elapsed: 11.781 s  <<< FAILURE!
> 10:36:13  java.lang.AssertionError: expected:<12> but was:<13>
> 10:36:13  at org.junit.Assert.fail(Assert.java:89)
> 10:36:13  at org.junit.Assert.failNotEquals(Assert.java:835)
> 10:36:13  at org.junit.Assert.assertEquals(Assert.java:647)
> 10:36:13  at org.junit.Assert.assertEquals(Assert.java:633)
> 10:36:13  at 
> org.apache.iceberg.mr.hive.TestConflictingDataFiles.testMultiFiltersUpdate(TestConflictingDataFiles.java:135)
> 10:36:13  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 10:36:13  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 10:36:13  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 10:36:13  at java.lang.reflect.Method.invoke(Method.java:498)
> 10:36:13  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> 10:36:13  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 10:36:13  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> 10:36:13  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 10:36:13  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 10:36:13  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> 10:36:13  at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> 10:36:13  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
> 10:36:13  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
> 10:36:13  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 10:36:13  at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28153) Flaky test TestConflictingDataFiles.testMultiFiltersUpdate

2024-04-15 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837165#comment-17837165
 ] 

Simhadri Govindappa commented on HIVE-28153:


Change is merged to master.
Thanks, [~dkuzmenko]  and [~zhangbutao]  for the review!


Additionally, I have raised HIVE-28192 to investigate the bug mentioned above. 
It seems like the IOContext is shared between threads in non-vectorized code 
flow which is causing duplicate records.

> Flaky test TestConflictingDataFiles.testMultiFiltersUpdate
> --
>
> Key: HIVE-28153
> URL: https://issues.apache.org/jira/browse/HIVE-28153
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Reporter: Butao Zhang
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>
> This test has been failing a lot lately, such as 
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5063/13/tests/]
>  
> And the flaky test shows this test is unstable:
> [http://ci.hive.apache.org/job/hive-flaky-check/831/testReport/]
> {code:java}
> 10:29:21  [INFO]  T E S T S
> 10:29:21  [INFO] ---
> 10:29:21  [INFO] Running org.apache.iceberg.mr.hive.TestConflictingDataFiles
> 10:36:13  [ERROR] Tests run: 60, Failures: 1, Errors: 0, Skipped: 24, Time 
> elapsed: 399.12 s <<< FAILURE! - in 
> org.apache.iceberg.mr.hive.TestConflictingDataFiles
> 10:36:13  [ERROR] 
> org.apache.iceberg.mr.hive.TestConflictingDataFiles.testMultiFiltersUpdate[fileFormat=PARQUET,
>  engine=tez, catalog=HIVE_CATALOG, isVectorized=false, formatVersion=1]  Time 
> elapsed: 11.781 s  <<< FAILURE!
> 10:36:13  java.lang.AssertionError: expected:<12> but was:<13>
> 10:36:13  at org.junit.Assert.fail(Assert.java:89)
> 10:36:13  at org.junit.Assert.failNotEquals(Assert.java:835)
> 10:36:13  at org.junit.Assert.assertEquals(Assert.java:647)
> 10:36:13  at org.junit.Assert.assertEquals(Assert.java:633)
> 10:36:13  at 
> org.apache.iceberg.mr.hive.TestConflictingDataFiles.testMultiFiltersUpdate(TestConflictingDataFiles.java:135)
> 10:36:13  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 10:36:13  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 10:36:13  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 10:36:13  at java.lang.reflect.Method.invoke(Method.java:498)
> 10:36:13  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> 10:36:13  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 10:36:13  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> 10:36:13  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 10:36:13  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 10:36:13  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> 10:36:13  at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> 10:36:13  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
> 10:36:13  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
> 10:36:13  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 10:36:13  at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28192) Iceberg: Fix thread safety issue with PositionDeleteInfo in IOContext

2024-04-09 Thread Simhadri Govindappa (Jira)
Simhadri Govindappa created HIVE-28192:
--

 Summary: Iceberg: Fix thread safety issue with PositionDeleteInfo 
in IOContext
 Key: HIVE-28192
 URL: https://issues.apache.org/jira/browse/HIVE-28192
 Project: Hive
  Issue Type: Task
Reporter: Simhadri Govindappa
Assignee: Simhadri Govindappa






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-28153) Flaky test TestConflictingDataFiles.testMultiFiltersUpdate

2024-04-01 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832862#comment-17832862
 ] 

Simhadri Govindappa edited comment on HIVE-28153 at 4/1/24 2:34 PM:


I investigated the issue. 

Looks like the test caught a new bug in the code of the iceberg delete writer. 
I will add more details soon. 


was (Author: simhadri-g):
I investigated the issue. 

Looks like the test caught the new bug in the code of the iceberg delete 
writer. I will add more details soon. 

> Flaky test TestConflictingDataFiles.testMultiFiltersUpdate
> --
>
> Key: HIVE-28153
> URL: https://issues.apache.org/jira/browse/HIVE-28153
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Reporter: Butao Zhang
>Assignee: Simhadri Govindappa
>Priority: Major
>
> This test has been failing a lot lately, such as 
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5063/13/tests/]
>  
> And the flaky test shows this test is unstable:
> [http://ci.hive.apache.org/job/hive-flaky-check/831/testReport/]
> {code:java}
> 10:29:21  [INFO]  T E S T S
> 10:29:21  [INFO] ---
> 10:29:21  [INFO] Running org.apache.iceberg.mr.hive.TestConflictingDataFiles
> 10:36:13  [ERROR] Tests run: 60, Failures: 1, Errors: 0, Skipped: 24, Time 
> elapsed: 399.12 s <<< FAILURE! - in 
> org.apache.iceberg.mr.hive.TestConflictingDataFiles
> 10:36:13  [ERROR] 
> org.apache.iceberg.mr.hive.TestConflictingDataFiles.testMultiFiltersUpdate[fileFormat=PARQUET,
>  engine=tez, catalog=HIVE_CATALOG, isVectorized=false, formatVersion=1]  Time 
> elapsed: 11.781 s  <<< FAILURE!
> 10:36:13  java.lang.AssertionError: expected:<12> but was:<13>
> 10:36:13  at org.junit.Assert.fail(Assert.java:89)
> 10:36:13  at org.junit.Assert.failNotEquals(Assert.java:835)
> 10:36:13  at org.junit.Assert.assertEquals(Assert.java:647)
> 10:36:13  at org.junit.Assert.assertEquals(Assert.java:633)
> 10:36:13  at 
> org.apache.iceberg.mr.hive.TestConflictingDataFiles.testMultiFiltersUpdate(TestConflictingDataFiles.java:135)
> 10:36:13  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 10:36:13  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 10:36:13  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 10:36:13  at java.lang.reflect.Method.invoke(Method.java:498)
> 10:36:13  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> 10:36:13  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 10:36:13  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> 10:36:13  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 10:36:13  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 10:36:13  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> 10:36:13  at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> 10:36:13  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
> 10:36:13  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
> 10:36:13  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 10:36:13  at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28153) Flaky test TestConflictingDataFiles.testMultiFiltersUpdate

2024-04-01 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832862#comment-17832862
 ] 

Simhadri Govindappa commented on HIVE-28153:


I investigated the issue. 

Looks like the test caught the new bug in the code of the iceberg delete 
writer. I will add more details soon. 

> Flaky test TestConflictingDataFiles.testMultiFiltersUpdate
> --
>
> Key: HIVE-28153
> URL: https://issues.apache.org/jira/browse/HIVE-28153
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Reporter: Butao Zhang
>Assignee: Simhadri Govindappa
>Priority: Major
>
> This test has been failing a lot lately, such as 
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5063/13/tests/]
>  
> And the flaky test shows this test is unstable:
> [http://ci.hive.apache.org/job/hive-flaky-check/831/testReport/]
> {code:java}
> 10:29:21  [INFO]  T E S T S
> 10:29:21  [INFO] ---
> 10:29:21  [INFO] Running org.apache.iceberg.mr.hive.TestConflictingDataFiles
> 10:36:13  [ERROR] Tests run: 60, Failures: 1, Errors: 0, Skipped: 24, Time 
> elapsed: 399.12 s <<< FAILURE! - in 
> org.apache.iceberg.mr.hive.TestConflictingDataFiles
> 10:36:13  [ERROR] 
> org.apache.iceberg.mr.hive.TestConflictingDataFiles.testMultiFiltersUpdate[fileFormat=PARQUET,
>  engine=tez, catalog=HIVE_CATALOG, isVectorized=false, formatVersion=1]  Time 
> elapsed: 11.781 s  <<< FAILURE!
> 10:36:13  java.lang.AssertionError: expected:<12> but was:<13>
> 10:36:13  at org.junit.Assert.fail(Assert.java:89)
> 10:36:13  at org.junit.Assert.failNotEquals(Assert.java:835)
> 10:36:13  at org.junit.Assert.assertEquals(Assert.java:647)
> 10:36:13  at org.junit.Assert.assertEquals(Assert.java:633)
> 10:36:13  at 
> org.apache.iceberg.mr.hive.TestConflictingDataFiles.testMultiFiltersUpdate(TestConflictingDataFiles.java:135)
> 10:36:13  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 10:36:13  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 10:36:13  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 10:36:13  at java.lang.reflect.Method.invoke(Method.java:498)
> 10:36:13  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> 10:36:13  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 10:36:13  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> 10:36:13  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 10:36:13  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 10:36:13  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> 10:36:13  at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> 10:36:13  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
> 10:36:13  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
> 10:36:13  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 10:36:13  at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-28153) Flaky test TestConflictingDataFiles.testMultiFiltersUpdate

2024-03-27 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa reassigned HIVE-28153:
--

Assignee: Simhadri Govindappa

> Flaky test TestConflictingDataFiles.testMultiFiltersUpdate
> --
>
> Key: HIVE-28153
> URL: https://issues.apache.org/jira/browse/HIVE-28153
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Reporter: Butao Zhang
>Assignee: Simhadri Govindappa
>Priority: Major
>
> This test has been failing a lot lately, such as 
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5063/13/tests/]
>  
> And the flaky test shows this test is unstable:
> [http://ci.hive.apache.org/job/hive-flaky-check/831/testReport/]
> {code:java}
> 10:29:21  [INFO]  T E S T S
> 10:29:21  [INFO] ---
> 10:29:21  [INFO] Running org.apache.iceberg.mr.hive.TestConflictingDataFiles
> 10:36:13  [ERROR] Tests run: 60, Failures: 1, Errors: 0, Skipped: 24, Time 
> elapsed: 399.12 s <<< FAILURE! - in 
> org.apache.iceberg.mr.hive.TestConflictingDataFiles
> 10:36:13  [ERROR] 
> org.apache.iceberg.mr.hive.TestConflictingDataFiles.testMultiFiltersUpdate[fileFormat=PARQUET,
>  engine=tez, catalog=HIVE_CATALOG, isVectorized=false, formatVersion=1]  Time 
> elapsed: 11.781 s  <<< FAILURE!
> 10:36:13  java.lang.AssertionError: expected:<12> but was:<13>
> 10:36:13  at org.junit.Assert.fail(Assert.java:89)
> 10:36:13  at org.junit.Assert.failNotEquals(Assert.java:835)
> 10:36:13  at org.junit.Assert.assertEquals(Assert.java:647)
> 10:36:13  at org.junit.Assert.assertEquals(Assert.java:633)
> 10:36:13  at 
> org.apache.iceberg.mr.hive.TestConflictingDataFiles.testMultiFiltersUpdate(TestConflictingDataFiles.java:135)
> 10:36:13  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 10:36:13  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 10:36:13  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 10:36:13  at java.lang.reflect.Method.invoke(Method.java:498)
> 10:36:13  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> 10:36:13  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 10:36:13  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> 10:36:13  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 10:36:13  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 10:36:13  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> 10:36:13  at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> 10:36:13  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
> 10:36:13  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
> 10:36:13  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 10:36:13  at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27929) Run TPC-DS queries and validate results correctness

2024-03-25 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830442#comment-17830442
 ] 

Simhadri Govindappa commented on HIVE-27929:


Hi 

I have rerun the 1TB TPCDS test for the branch-4.0 
[https://github.com/apache/hive/tree/branch-4.0]  all the queries ran 
successfully with the correct results. Please find the details of the test 
below:
 # Create a 1 TB tpcds dataset .dat files
 # Loaded the data files to text tables
 # Create External ORC TPCDS tables and loaded the data from these Text tables
 # Followed by running the 99 TPCDS queries 

 

> Run TPC-DS queries and validate results correctness
> ---
>
> Key: HIVE-27929
> URL: https://issues.apache.org/jira/browse/HIVE-27929
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Assignee: Simhadri Govindappa
>Priority: Major
>
> release branch: *branch-4.0*
> https://github.com/apache/hive/tree/branch-4.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27953) Retire https://apache.github.io sites and remove obsolete content/actions

2024-03-22 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17829802#comment-17829802
 ] 

Simhadri Govindappa commented on HIVE-27953:


Thanks [~zabetak] , for helping with the reviews! :)

> Retire https://apache.github.io sites and remove obsolete content/actions
> -
>
> Key: HIVE-27953
> URL: https://issues.apache.org/jira/browse/HIVE-27953
> Project: Hive
>  Issue Type: Task
>  Components: Documentation
>Reporter: Stamatis Zampetakis
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> Currently there are three versions of the Hive website (populated from 
> different places and in various ways) available online. Below, I outline the 
> entry point URLs along with the latest commit that lead to the deployment 
> each version.
> ||URL||Commit||
> |https://hive.apache.org/|https://github.com/apache/hive-site/commit/0162552c68006fd30411033d5e6a3d6806026851|
> |https://apache.github.io/hive/|https://github.com/apache/hive/commit/1455f6201b0f7b061361bc9acc23cb810ff02483|
> |https://apache.github.io/hive-site/|https://github.com/apache/hive-site/commit/95b1c8385fa50c2e59579899d2fd297b8a2ecefd|
> People searching online for Hive may end-up in any of the above risking to 
> see pretty outdated information about the project. 
> For Hive developers (especially newcomers) it is very difficult to figure out 
> where they should apply their changes if they want to change something in the 
> website. Even people experienced with the various offering of ASF and GitHub 
> may have a hard time figuring things out.
> I propose to retire/shutdown all GitHub pages deployments 
> (https://apache.github.io) and drop all content/branches that are not 
> relevant for the main website under https://hive.apache.org/.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28087) Hive Iceberg: Insert into partitioned table fails if the data is not clustered

2024-03-13 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated HIVE-28087:
---
Description: 
Insert into partitioned table fails with the following error if the data is not 
clustered.

*Using cluster by clause it succeeds :* 
{noformat}
0: jdbc:hive2://localhost:10001/> insert into table partition_transform_4 
select t, ts from t1 cluster by ts;

--
VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED
--
Map 1 .. container SUCCEEDED  1  100
   0   0
Reducer 2 .. container SUCCEEDED  1  100
   0   0
--
VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 9.47 s
--
INFO  : Starting task [Stage-2:DEPENDENCY_COLLECTION] in serial mode
INFO  : Starting task [Stage-0:MOVE] in serial mode
INFO  : Completed executing 
command(queryId=root_20240222123244_0c448b32-4fd9-420d-be31-e39e2972af82); 
Time taken: 10.534 seconds
100 rows affected (10.696 seconds){noformat}
 

*Without cluster By it fails:* 
{noformat}
0: jdbc:hive2://localhost:10001/> insert into table partition_transform_4 
select t, ts from t1;

--
VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED
--
Map 1 .. container SUCCEEDED  1  100
   0   0
Reducer 2container   RUNNING  1  010
   2   0
--
VERTICES: 01/02  [=>>-] 50%   ELAPSED TIME: 9.53 s
--
Caused by: java.lang.IllegalStateException: Incoming records violate the writer 
assumption that records are clustered by spec and by partition within each 
spec. Either cluster the incoming records or switch to fanout writers.
Encountered records that belong to already closed files:
partition 'ts_month=2027-03' in spec [
  1000: ts_month: month(2)
]
at org.apache.iceberg.io.ClusteredWriter.write(ClusteredWriter.java:96)
at 
org.apache.iceberg.io.ClusteredDataWriter.write(ClusteredDataWriter.java:31)
at 
org.apache.iceberg.mr.hive.writer.HiveIcebergRecordWriter.write(HiveIcebergRecordWriter.java:53)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1181)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111)
at 
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:502)
... 20 more{noformat}
 

 

A simple repro, using the attached csv file: 
[^query-hive-377.csv]
{noformat}
create database t3;

use t3;

create table vector1k(
        t int,
        si int,
        i int,
        b bigint,
        f float,
        d double,
        dc decimal(38,18),
        bo boolean,
        s string,
        s2 string,
        ts timestamp,
        ts2 timestamp,
        dt date)
     row format delimited fields terminated by ',';

load data local inpath "/query-hive-377.csv" OVERWRITE into table vector1k; 


select * from vector1k; create table vectortab10k(
        t int,
        si int,
        i int,
        b bigint,
        f float,
        d double,
        dc decimal(38,18),
        bo boolean,
        s string,
        s2 string,
        ts timestamp,
        ts2 timestamp,
        dt date)
    stored by iceberg
    stored as orc;
    
insert into vectortab10k  select * from vector1k;

select count(*) from vectortab10k ;

create table partition_transform_4(t int, ts timestamp) partitioned by 
spec(month(ts)) stored by iceberg;

insert into table partition_transform_4 select t, ts from vectortab10k ;
{noformat}

  was:
Insert into partitioned table fails with the following error if the data is not 
clustered.

*Using cluster by clause it succeeds :* 
{noformat}
0: jdbc:hive2://localhost:10001/> insert into table partition_transform_4 
select t, ts from t1 cluster by ts;

-

[jira] [Commented] (HIVE-28107) Remove the docs directory

2024-03-07 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824325#comment-17824325
 ] 

Simhadri Govindappa commented on HIVE-28107:


Thanks [~zabetak] , I will update the PR. 

> Remove the docs directory
> -
>
> Key: HIVE-28107
> URL: https://issues.apache.org/jira/browse/HIVE-28107
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
> Fix For: Not Applicable
>
>
> The doc directory was used to host the old hive website.  
> Since the revamped hive website was moved to and hosted from 
> [https://github.com/apache/hive-site/]  for almost a year now without any 
> issues, this docs directory in the main repo is no longer required. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-28107) Remove the docs directory

2024-03-07 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824314#comment-17824314
 ] 

Simhadri Govindappa edited comment on HIVE-28107 at 3/7/24 8:51 AM:


Hi [~zabetak] ,

Yes this seems to be a duplicate. Sorry,  I was not aware of HIVE-27953 . 

Shall  I mark this Jira as duplicate and close the PR?


was (Author: simhadri-g):
Hi [~zabetak] ,

Yes this seems to be a duplicate, I was not aware of HIVE-27953 . 

Shall  I mark this Jira as duplicate and close the PR?

> Remove the docs directory
> -
>
> Key: HIVE-28107
> URL: https://issues.apache.org/jira/browse/HIVE-28107
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>
> The doc directory was used to host the old hive website.  
> Since the revamped hive website was moved to and hosted from 
> [https://github.com/apache/hive-site/]  for almost a year now without any 
> issues, this docs directory in the main repo is no longer required. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28107) Remove the docs directory

2024-03-07 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824314#comment-17824314
 ] 

Simhadri Govindappa commented on HIVE-28107:


Hi [~zabetak] ,

Yes this seems to be a duplicate, I was not aware of HIVE-27953 . 

Shall  I mark this Jira as duplicate and close the PR?

> Remove the docs directory
> -
>
> Key: HIVE-28107
> URL: https://issues.apache.org/jira/browse/HIVE-28107
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>
> The doc directory was used to host the old hive website.  
> Since the revamped hive website was moved to and hosted from 
> [https://github.com/apache/hive-site/]  for almost a year now without any 
> issues, this docs directory in the main repo is no longer required. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28107) Remove the docs directory

2024-03-06 Thread Simhadri Govindappa (Jira)
Simhadri Govindappa created HIVE-28107:
--

 Summary: Remove the docs directory
 Key: HIVE-28107
 URL: https://issues.apache.org/jira/browse/HIVE-28107
 Project: Hive
  Issue Type: Task
  Components: Hive
Reporter: Simhadri Govindappa
Assignee: Simhadri Govindappa


The doc directory was used to host the old hive website.  

Since the revamped hive website was moved to and hosted from 
[https://github.com/apache/hive-site/]  for almost a year now without any 
issues, this docs directory in the main repo is no longer required. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28087) Hive Iceberg: Insert into partitioned table fails if the data is not clustered

2024-02-22 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated HIVE-28087:
---
Description: 
Insert into partitioned table fails with the following error if the data is not 
clustered.

*Using cluster by clause it succeeds :* 
{noformat}
0: jdbc:hive2://localhost:10001/> insert into table partition_transform_4 
select t, ts from t1 cluster by ts;

--
VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED
--
Map 1 .. container SUCCEEDED  1  100
   0   0
Reducer 2 .. container SUCCEEDED  1  100
   0   0
--
VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 9.47 s
--
INFO  : Starting task [Stage-2:DEPENDENCY_COLLECTION] in serial mode
INFO  : Starting task [Stage-0:MOVE] in serial mode
INFO  : Completed executing 
command(queryId=root_20240222123244_0c448b32-4fd9-420d-be31-e39e2972af82); 
Time taken: 10.534 seconds
100 rows affected (10.696 seconds){noformat}
 

*Without cluster By it fails:* 
{noformat}
0: jdbc:hive2://localhost:10001/> insert into table partition_transform_4 
select t, ts from t1;

--
VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED
--
Map 1 .. container SUCCEEDED  1  100
   0   0
Reducer 2container   RUNNING  1  010
   2   0
--
VERTICES: 01/02  [=>>-] 50%   ELAPSED TIME: 9.53 s
--
Caused by: java.lang.IllegalStateException: Incoming records violate the writer 
assumption that records are clustered by spec and by partition within each 
spec. Either cluster the incoming records or switch to fanout writers.
Encountered records that belong to already closed files:
partition 'ts_month=2027-03' in spec [
  1000: ts_month: month(2)
]
at org.apache.iceberg.io.ClusteredWriter.write(ClusteredWriter.java:96)
at 
org.apache.iceberg.io.ClusteredDataWriter.write(ClusteredDataWriter.java:31)
at 
org.apache.iceberg.mr.hive.writer.HiveIcebergRecordWriter.write(HiveIcebergRecordWriter.java:53)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1181)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111)
at 
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:502)
... 20 more{noformat}
 

 

A simple repro, using the attached csv file: 
[^query-hive-377.csv]
{noformat}
create database t3;

use t3;

create table vector1k(
        t int,
        si int,
        i int,
        b bigint,
        f float,
        d double,
        dc decimal(38,18),
        bo boolean,
        s string,
        s2 string,
        ts timestamp,
        ts2 timestamp,
        dt date)
     row format delimited fields terminated by ',';

load data local inpath "/query-hive-377.csv" OVERWRITE into table vector1k; 


select * from vector1k; create table vectortab10k(
        t int,
        si int,
        i int,
        b bigint,
        f float,
        d double,
        dc decimal(38,18),
        bo boolean,
        s string,
        s2 string,
        ts timestamp,
        ts2 timestamp,
        dt date)
    stored by iceberg
    stored as orc;
    
insert into vectortab10k  select * from vector1k;select count(*) from 
vectortab10k limit 10;

create table partition_transform_4(t int, ts timestamp) partitioned by 
spec(month(ts)) stored by iceberg;

insert into table partition_transform_4 select t, ts from vectortab10k ;
{noformat}

  was:
Insert into partitioned table fails with the following error if the data is not 
clustered.


{noformat}
Caused by: java.lang.IllegalStateException: Incoming records violate the writer 
assumption that records are clustered by spec and by partition within each 
spec.

[jira] [Created] (HIVE-28087) Hive Iceberg: Insert into partitioned table fails if the data is not clustered

2024-02-22 Thread Simhadri Govindappa (Jira)
Simhadri Govindappa created HIVE-28087:
--

 Summary: Hive Iceberg: Insert into partitioned table  fails if the 
data is not clustered
 Key: HIVE-28087
 URL: https://issues.apache.org/jira/browse/HIVE-28087
 Project: Hive
  Issue Type: Task
Reporter: Simhadri Govindappa
Assignee: Simhadri Govindappa
 Attachments: query-hive-377.csv

Insert into partitioned table fails with the following error if the data is not 
clustered.


{noformat}
Caused by: java.lang.IllegalStateException: Incoming records violate the writer 
assumption that records are clustered by spec and by partition within each 
spec. Either cluster the incoming records or switch to fanout writers.
Encountered records that belong to already closed files:
partition 'ts_month=2027-03' in spec [
  1000: ts_month: month(2)
]
    at org.apache.iceberg.io.ClusteredWriter.write(ClusteredWriter.java:96)
    at 
org.apache.iceberg.io.ClusteredDataWriter.write(ClusteredDataWriter.java:31)
    at 
org.apache.iceberg.mr.hive.writer.HiveIcebergRecordWriter.write(HiveIcebergRecordWriter.java:53)
    at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1181)
    at 
org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111)
    at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
    at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
    at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:502)
    ... 20 more{noformat}


A simple repro, using the attached csv file: 
[^query-hive-377.csv]
{noformat}
create database t3;

use t3;

create table vector1k(
        t int,
        si int,
        i int,
        b bigint,
        f float,
        d double,
        dc decimal(38,18),
        bo boolean,
        s string,
        s2 string,
        ts timestamp,
        ts2 timestamp,
        dt date)
     row format delimited fields terminated by ',';

load data local inpath "/query-hive-377.csv" OVERWRITE into table vector1k; 


select * from vector1k; create table vectortab10k(
        t int,
        si int,
        i int,
        b bigint,
        f float,
        d double,
        dc decimal(38,18),
        bo boolean,
        s string,
        s2 string,
        ts timestamp,
        ts2 timestamp,
        dt date)
    stored by iceberg
    stored as orc;
    
insert into vectortab10k  select * from vector1k;select count(*) from 
vectortab10k limit 10;

create table partition_transform_4(t int, ts timestamp) partitioned by 
spec(month(ts)) stored by iceberg;

insert into table partition_transform_4 select t, ts from vectortab10k ;
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-28048) Hive cannot run ORDER BY queries on Iceberg tables partitioned by decimal columns

2024-01-30 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812500#comment-17812500
 ] 

Simhadri Govindappa edited comment on HIVE-28048 at 1/30/24 10:36 PM:
--

This seems to be the same as https://issues.apache.org/jira/browse/HIVE-27938 
I have a PR for this, please help with the review: 
[https://github.com/apache/hive/pull/5048]  
Thanks! 


was (Author: simhadri-g):
This seems to be the same as https://issues.apache.org/jira/browse/HIVE-27938 
I have a PR up for review. [https://github.com/apache/hive/pull/5048] 

> Hive cannot run ORDER BY queries on Iceberg tables partitioned by decimal 
> columns
> -
>
> Key: HIVE-28048
> URL: https://issues.apache.org/jira/browse/HIVE-28048
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: iceberg
>
> Repro:
> {noformat}
> create table test_dec (d decimal(8,4), i int)
> partitioned by spec (d)
> stored by iceberg;
> insert into test_dec values (3.4, 5), (4.5, 6);
> select * from test_dec order by i;
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28048) Hive cannot run ORDER BY queries on Iceberg tables partitioned by decimal columns

2024-01-30 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812500#comment-17812500
 ] 

Simhadri Govindappa commented on HIVE-28048:


This seems to be the same as https://issues.apache.org/jira/browse/HIVE-27938 
I have a PR up for review. [https://github.com/apache/hive/pull/5048] 

> Hive cannot run ORDER BY queries on Iceberg tables partitioned by decimal 
> columns
> -
>
> Key: HIVE-28048
> URL: https://issues.apache.org/jira/browse/HIVE-28048
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: iceberg
>
> Repro:
> {noformat}
> create table test_dec (d decimal(8,4), i int)
> partitioned by spec (d)
> stored by iceberg;
> insert into test_dec values (3.4, 5), (4.5, 6);
> select * from test_dec order by i;
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27938) Iceberg: Fix java.lang.ClassCastException during vectorized reads on partition columns

2024-01-29 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated HIVE-27938:
---
Summary: Iceberg: Fix java.lang.ClassCastException during vectorized reads 
on partition columns   (was: Iceberg: Date type Partitioned column throws 
java.lang.ClassCastException: java.time.LocalDate cannot be cast to 
org.apache.hadoop.hive.common.type.Date)

> Iceberg: Fix java.lang.ClassCastException during vectorized reads on 
> partition columns 
> ---
>
> Key: HIVE-27938
> URL: https://issues.apache.org/jira/browse/HIVE-27938
> Project: Hive
>  Issue Type: Bug
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>
> {code:java}
> 1: jdbc:hive2://localhost:10001/> CREATE EXTERNAL TABLE ice3   (`col1` int, 
> `calday` date) PARTITIONED BY SPEC (calday)   stored by iceberg 
> tblproperties('format-version'='2'); 
> 1: jdbc:hive2://localhost:10001/>insert into ice3 values(1, '2020-11-20'); 
> 1: jdbc:hive2://localhost:10001/> select count(calday) from ice3;
> {code}
> Full stack trace: 
> {code:java}
> INFO  : Compiling 
> command(queryId=root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feab): 
> select count(calday) from ice3INFO  : No Stats for default@ice3, Columns: 
> caldayINFO  : Semantic Analysis Completed (retrial = false)INFO  : Created 
> Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, 
> comment:null)], properties:null)INFO  : Completed compiling 
> command(queryId=root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feab); 
> Time taken: 0.196 secondsINFO  : Operation QUERY obtained 0 locksINFO  : 
> Executing 
> command(queryId=root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feab): 
> select count(calday) from ice3INFO  : Query ID = 
> root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feabINFO  : Total jobs = 
> 1INFO  : Launching Job 1 out of 1INFO  : Starting task [Stage-1:MAPRED] in 
> serial modeINFO  : Subscribed to counters: [] for queryId: 
> root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feabINFO  : Session is 
> already openINFO  : Dag name: select count(calday) from ice3 (Stage-1)INFO  : 
> HS2 Host: [localhost], Query ID: 
> [root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feab], Dag ID: 
> [dag_1701888162260_0001_2], DAG Session ID: 
> [application_1701888162260_0001]INFO  : Status: Running (Executing on YARN 
> cluster with App id application_1701888162260_0001)
> --
>         VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  
> KILLED--Map
>  1            container       RUNNING      1          0        0        1     
>   4       0Reducer 2        container        INITED      1          0        
> 0        1       0       
> 0--VERTICES:
>  00/02  [>>--] 0%    ELAPSED TIME: 1.41 
> s--ERROR
>  : Status: FailedERROR : Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1701888162260_0001_2_00, diagnostics=[Task failed, 
> taskId=task_1701888162260_0001_2_00_00, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1701888162260_0001_2_00_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.lang.ClassCastException: java.time.LocalDate cannot be cast to 
> org.apache.hadoop.hive.common.type.Dateat 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)   
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>at java.security.AccessController.doPrivileged(Native Method)   at 
> javax.security.auth.Subject.doAs(Subject.java:422)   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)  
> at 
> com.google.common.util.concurrent

[jira] [Commented] (HIVE-27938) Iceberg: Date type Partitioned column throws java.lang.ClassCastException: java.time.LocalDate cannot be cast to org.apache.hadoop.hive.common.type.Date

2024-01-29 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812108#comment-17812108
 ] 

Simhadri Govindappa commented on HIVE-27938:


The error is also present for DATE and DECIMAL columns

> Iceberg: Date type Partitioned column throws java.lang.ClassCastException: 
> java.time.LocalDate cannot be cast to org.apache.hadoop.hive.common.type.Date
> 
>
> Key: HIVE-27938
> URL: https://issues.apache.org/jira/browse/HIVE-27938
> Project: Hive
>  Issue Type: Bug
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>
> {code:java}
> 1: jdbc:hive2://localhost:10001/> CREATE EXTERNAL TABLE ice3   (`col1` int, 
> `calday` date) PARTITIONED BY SPEC (calday)   stored by iceberg 
> tblproperties('format-version'='2'); 
> 1: jdbc:hive2://localhost:10001/>insert into ice3 values(1, '2020-11-20'); 
> 1: jdbc:hive2://localhost:10001/> select count(calday) from ice3;
> {code}
> Full stack trace: 
> {code:java}
> INFO  : Compiling 
> command(queryId=root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feab): 
> select count(calday) from ice3INFO  : No Stats for default@ice3, Columns: 
> caldayINFO  : Semantic Analysis Completed (retrial = false)INFO  : Created 
> Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, 
> comment:null)], properties:null)INFO  : Completed compiling 
> command(queryId=root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feab); 
> Time taken: 0.196 secondsINFO  : Operation QUERY obtained 0 locksINFO  : 
> Executing 
> command(queryId=root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feab): 
> select count(calday) from ice3INFO  : Query ID = 
> root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feabINFO  : Total jobs = 
> 1INFO  : Launching Job 1 out of 1INFO  : Starting task [Stage-1:MAPRED] in 
> serial modeINFO  : Subscribed to counters: [] for queryId: 
> root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feabINFO  : Session is 
> already openINFO  : Dag name: select count(calday) from ice3 (Stage-1)INFO  : 
> HS2 Host: [localhost], Query ID: 
> [root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feab], Dag ID: 
> [dag_1701888162260_0001_2], DAG Session ID: 
> [application_1701888162260_0001]INFO  : Status: Running (Executing on YARN 
> cluster with App id application_1701888162260_0001)
> --
>         VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  
> KILLED--Map
>  1            container       RUNNING      1          0        0        1     
>   4       0Reducer 2        container        INITED      1          0        
> 0        1       0       
> 0--VERTICES:
>  00/02  [>>--] 0%    ELAPSED TIME: 1.41 
> s--ERROR
>  : Status: FailedERROR : Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1701888162260_0001_2_00, diagnostics=[Task failed, 
> taskId=task_1701888162260_0001_2_00_00, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1701888162260_0001_2_00_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.lang.ClassCastException: java.time.LocalDate cannot be cast to 
> org.apache.hadoop.hive.common.type.Dateat 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)   
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>at java.security.AccessController.doPrivileged(Native Method)   at 
> javax.security.auth.Subject.doAs(Subject.java:422)   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)  
> at 
> com.google.common.util.concurrent.TrustedListenableFutu

[jira] [Comment Edited] (HIVE-27929) Run TPC-DS queries and validate results correctness

2024-01-29 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17811790#comment-17811790
 ] 

Simhadri Govindappa edited comment on HIVE-27929 at 1/29/24 9:36 AM:
-

I was able to run a 1tb tpcds run hive master, with the following versions: 
 # Hive - master (last commit from 9th of jan)
 # Hadoop- 3.3.6
 # Tez- 0.10.2 (with a patch to remove the conflicting hadoop-client jar from 
classpath)

 

With these versions,
 * *ORC External:* I was able to run all the tpcds queries successfully. 
 * *ORC manager:* Faced the same issue described above. 


was (Author: simhadri-g):
I was able to run a 1tb tpcds run hive master, with the following versions: 
 # Hive - master (last commit from 9th of jan)
 # Hadoop- 3.3.6
 # Tez- 0.10.2 (with a patch to remove the conflicting hadoop-client jar from 
classpath)

 

With these versions,
 * *ORC External:* I was able to run all the tpcds queries successfully. 
 * *ORC manager:* Faced the same issue described above. ( HIVE-28004 )

> Run TPC-DS queries and validate results correctness
> ---
>
> Key: HIVE-27929
> URL: https://issues.apache.org/jira/browse/HIVE-27929
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Assignee: Simhadri Govindappa
>Priority: Major
>
> release branch: *branch-4.0*
> https://github.com/apache/hive/tree/branch-4.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-27929) Run TPC-DS queries and validate results correctness

2024-01-29 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17811790#comment-17811790
 ] 

Simhadri Govindappa edited comment on HIVE-27929 at 1/29/24 9:35 AM:
-

I was able to run a 1tb tpcds run hive master, with the following versions: 
 # Hive - master (last commit from 9th of jan)
 # Hadoop- 3.3.6
 # Tez- 0.10.2 (with a patch to remove the conflicting hadoop-client jar from 
classpath)

 

With these versions,
 * *ORC External:* I was able to run all the tpcds queries successfully. 
 * *ORC manager:* Faced the same issue described above. ( HIVE-28004 )


was (Author: simhadri-g):
I was able to run a 1tb tpcds run hive master, with the following versions: 
 # Hive - master (last commit from 9th of jan)
 # Hadoop- 3.3.6
 # Tez- 0.10.2 (with a patch to remove the conflicting hadoop-client jar from 
classpath)

 

With these versions,
 * *ORC External:* I was able to run all the tpcds queries successfully. 
 * *ORC manager:* Faced the same issue described above. ( HIVE-28004 )

 

 

 

> Run TPC-DS queries and validate results correctness
> ---
>
> Key: HIVE-27929
> URL: https://issues.apache.org/jira/browse/HIVE-27929
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Assignee: Simhadri Govindappa
>Priority: Major
>
> release branch: *branch-4.0*
> https://github.com/apache/hive/tree/branch-4.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27929) Run TPC-DS queries and validate results correctness

2024-01-29 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17811790#comment-17811790
 ] 

Simhadri Govindappa commented on HIVE-27929:


I was able to run a 1tb tpcds run hive master, with the following versions: 
 # Hive - master (last commit from 9th of jan)
 # Hadoop- 3.3.6
 # Tez- 0.10.2 (with a patch to remove the conflicting hadoop-client jar from 
classpath)

 

With these versions,
 * *ORC External:* I was able to run all the tpcds queries successfully. 
 * *ORC manager:* Faced the same issue described above. ( HIVE-28004 )

 

 

 

> Run TPC-DS queries and validate results correctness
> ---
>
> Key: HIVE-27929
> URL: https://issues.apache.org/jira/browse/HIVE-27929
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Assignee: Simhadri Govindappa
>Priority: Major
>
> release branch: *branch-4.0*
> https://github.com/apache/hive/tree/branch-4.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28020) Iceberg: Upgrade iceberg version to 1.4.3

2024-01-23 Thread Simhadri Govindappa (Jira)
Simhadri Govindappa created HIVE-28020:
--

 Summary: Iceberg: Upgrade iceberg version to 1.4.3
 Key: HIVE-28020
 URL: https://issues.apache.org/jira/browse/HIVE-28020
 Project: Hive
  Issue Type: Task
Reporter: Simhadri Govindappa
Assignee: Simhadri Govindappa


Iceberg version 1.4.3 has been released. 
[https://github.com/apache/iceberg/releases/tag/apache-iceberg-1.4.3] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27938) Iceberg: Date type Partitioned column throws java.lang.ClassCastException: java.time.LocalDate cannot be cast to org.apache.hadoop.hive.common.type.Date

2023-12-06 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17793902#comment-17793902
 ] 

Simhadri Govindappa commented on HIVE-27938:


The query works fine when vectorization is disabled: 
{noformat}
*0: jdbc:hive2://sg-hive-1.sg-hive.root.hwx.si> set 
hive.vectorized.execution.enabled;
+-+
|                   set                   |
+-+
| hive.vectorized.execution.enabled=true  |
+-+
1 row selected (0.014 seconds)
0: jdbc:hive2://sg-hive-1.sg-hive.root.hwx.si> set 
hive.vectorized.execution.enabled=false;
No rows affected (0.008 seconds)
0: jdbc:hive2://sg-hive-1.sg-hive.root.hwx.si> set 
hive.vectorized.execution.enabled;
+--+
|                   set                    |
+--+
| hive.vectorized.execution.enabled=false  |
+--+
1 row selected (0.009 seconds)
0: jdbc:hive2://sg-hive-1.sg-hive.root.hwx.si> select count(calday) from ice3;
INFO  : Compiling 
command(queryId=hive_20231206190825_72fc4b03-bfb6-4b61-a421-64a809b46781): 
select count(calday) from ice3
INFO  : No Stats for default@ice3, Columns: calday
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, 
type:bigint, comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20231206190825_72fc4b03-bfb6-4b61-a421-64a809b46781); Time 
taken: 0.108 seconds
INFO  : Executing 
command(queryId=hive_20231206190825_72fc4b03-bfb6-4b61-a421-64a809b46781): 
select count(calday) from ice3
INFO  : Query ID = hive_20231206190825_72fc4b03-bfb6-4b61-a421-64a809b46781
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Subscribed to counters: [] for queryId: 
hive_20231206190825_72fc4b03-bfb6-4b61-a421-64a809b46781
INFO  : Session is already open
INFO  : Dag name: select count(calday) from ice3 (Stage-1)
INFO  : HS2 Host: [sg-hive-1.sg-hive.root.hwx.site], Query ID: 
[hive_20231206190825_72fc4b03-bfb6-4b61-a421-64a809b46781], Dag ID: 
[dag_1700588079029_0015_3], DAG Session ID: [application_1700588079029_0015]
INFO  : Status: Running (Executing on YARN cluster with App id 
application_1700588079029_0015)--
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED
--
Map 1 .. container     SUCCEEDED      1          1        0        0    
   0       0
Reducer 2 .. container     SUCCEEDED      1          1        0        0    
   0       0
--
VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 4.11 s
--
INFO  : Status: DAG finished successfully in 4.08 seconds
INFO  : DAG ID: dag_1700588079029_0015_3
INFO  :
INFO  : Query Execution Summary
INFO  : 
--
INFO  : OPERATION                            DURATION
INFO  : 
--
INFO  : Compile Query                           0.11s
INFO  : Prepare Plan                            0.04s
INFO  : Get Query Coordinator (AM)              0.00s
INFO  : Submit Plan                             0.02s
INFO  : Start DAG                               0.08s
INFO  : Run DAG                                 4.08s
INFO  : 
--
INFO  :
INFO  : Task Execution Summary
INFO  : 
--
INFO  :   VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)   
INPUT_RECORDS   OUTPUT_RECORDS
INFO  : 
--
INFO  :      Map 1           2012.00          4,030            123              
 1                1
INFO  :  Reducer 2             41.00            440             19              
 1                0
INFO  : 
--
INFO  :
INFO  : org.apache.tez.common.counters.DAGCounter:
INFO  :    NUM_SUCCEEDED_TASKS: 2
INFO  :    TOTAL_LAUNCHED_TASKS: 2
INFO  :    RACK_LOCAL_TASKS: 1
INFO  :    AM_CPU_MILLISECONDS: 530
INFO  :    AM_GC_TIME_MILLIS: 28
INFO  :    INITIAL_HELD_CONTAINERS: 0

[jira] [Created] (HIVE-27938) Iceberg: Date type Partitioned column throws java.lang.ClassCastException: java.time.LocalDate cannot be cast to org.apache.hadoop.hive.common.type.Date

2023-12-06 Thread Simhadri Govindappa (Jira)
Simhadri Govindappa created HIVE-27938:
--

 Summary: Iceberg: Date type Partitioned column throws 
java.lang.ClassCastException: java.time.LocalDate cannot be cast to 
org.apache.hadoop.hive.common.type.Date
 Key: HIVE-27938
 URL: https://issues.apache.org/jira/browse/HIVE-27938
 Project: Hive
  Issue Type: Bug
Reporter: Simhadri Govindappa
Assignee: Simhadri Govindappa


{code:java}
1: jdbc:hive2://localhost:10001/> CREATE EXTERNAL TABLE ice3   (`col1` int, 
`calday` date) PARTITIONED BY SPEC (calday)   stored by iceberg 
tblproperties('format-version'='2'); 

1: jdbc:hive2://localhost:10001/>insert into ice3 values(1, '2020-11-20'); 

1: jdbc:hive2://localhost:10001/> select count(calday) from ice3;

{code}
Full stack trace: 
{code:java}

INFO  : Compiling 
command(queryId=root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feab): 
select count(calday) from ice3INFO  : No Stats for default@ice3, Columns: 
caldayINFO  : Semantic Analysis Completed (retrial = false)INFO  : Created Hive 
schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], 
properties:null)INFO  : Completed compiling 
command(queryId=root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feab); Time 
taken: 0.196 secondsINFO  : Operation QUERY obtained 0 locksINFO  : Executing 
command(queryId=root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feab): 
select count(calday) from ice3INFO  : Query ID = 
root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feabINFO  : Total jobs = 
1INFO  : Launching Job 1 out of 1INFO  : Starting task [Stage-1:MAPRED] in 
serial modeINFO  : Subscribed to counters: [] for queryId: 
root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feabINFO  : Session is 
already openINFO  : Dag name: select count(calday) from ice3 (Stage-1)INFO  : 
HS2 Host: [localhost], Query ID: 
[root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feab], Dag ID: 
[dag_1701888162260_0001_2], DAG Session ID: 
[application_1701888162260_0001]INFO  : Status: Running (Executing on YARN 
cluster with App id application_1701888162260_0001)
--
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  
KILLED--Map
 1            container       RUNNING      1          0        0        1       
4       0Reducer 2        container        INITED      1          0        0    
    1       0       
0--VERTICES:
 00/02  [>>--] 0%    ELAPSED TIME: 1.41 
s--ERROR
 : Status: FailedERROR : Vertex failed, vertexName=Map 1, 
vertexId=vertex_1701888162260_0001_2_00, diagnostics=[Task failed, 
taskId=task_1701888162260_0001_2_00_00, diagnostics=[TaskAttempt 0 failed, 
info=[Error: Error while running task ( failure ) : 
attempt_1701888162260_0001_2_00_00_0:java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
java.lang.ClassCastException: java.time.LocalDate cannot be cast to 
org.apache.hadoop.hive.common.type.Date  at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
 at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)   at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
 at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
   at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
   at java.security.AccessController.doPrivileged(Native Method)   at 
javax.security.auth.Subject.doAs(Subject.java:422)   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
 at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)  at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
  at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
   at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
 at java.lang.Thread.run(Thread.java:750)Caused by: 
org.apache.h

[jira] [Commented] (HIVE-26673) Incorrect row count when vectorisation is enabled

2023-12-06 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17793744#comment-17793744
 ] 

Simhadri Govindappa commented on HIVE-26673:


No. This issue is fixed after HIVE-25142 .

> Incorrect row count when vectorisation is enabled
> -
>
> Key: HIVE-26673
> URL: https://issues.apache.org/jira/browse/HIVE-26673
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0-alpha-2
>Reporter: Simhadri Govindappa
>Priority: Major
>
> Repro:
> {noformat}
> select count(*) from
> (SELECT T0.plant_no,
> T0.part_chain,
> T0.part_new,
> T0.part_no
> FROM dm_ads_dims_prod.cloudera_test3 T0
> LEFT JOIN
> (SELECT T0.plant_no,
> T0.part_chain
> FROM
> (SELECT T0.plant_no,
> T0.part_chain,
> count( *) AS ct
> FROM dm_ads_dims_prod.cloudera_test3 T0
> WHERE purchase_pos = pos
> GROUP BY T0.plant_no,
> T0.part_chain) T0
> WHERE ct = 2 ) T1 ON T0.plant_no = T1.plant_no
> AND T0.part_chain = T1.part_chain
> WHERE T0.purchase_pos = T0.pos
> AND (T1.part_chain IS NULL
> OR (T1.part_chain IS NOT NULL
> AND T0.fd = 1)) ) s;
> {noformat}
> Run the query with the following settings on the repro cluster a few times
> {code:java}
> set hive.query.results.cache.enabled=false;
> set hive.compute.query.using.stats=false;
> set hive.auto.convert.join=true;
> {code}
> and the results was
> {code:java}
> 2682424
> 2682426
> 2682425{code}
>  
> Then turn off {{hive.auto.convert.join}}
> {code:java}
> set hive.query.results.cache.enabled=false;
> set hive.compute.query.using.stats=false;
> set hive.auto.convert.join=false;
> {code}
> and the result was always *2682420*
> Analyzing the plans with hive.auto.convert.join enabled vs disabled, the 
> difference is the type of join Map vs Merge.
> Additionally, vectorization also plays a role when turned off the result 
> became good:
> {code:java}
> SET hive.vectorized.execution.enabled=false;
> {code}
> It is also just a workaround and has negative impact on performance this 
> should help us narrow down where to find the cause of the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26673) Incorrect row count when vectorisation is enabled

2023-12-06 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa resolved HIVE-26673.

Fix Version/s: 4.0.0
   Resolution: Fixed

> Incorrect row count when vectorisation is enabled
> -
>
> Key: HIVE-26673
> URL: https://issues.apache.org/jira/browse/HIVE-26673
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0-alpha-2
>Reporter: Simhadri Govindappa
>Priority: Major
> Fix For: 4.0.0
>
>
> Repro:
> {noformat}
> select count(*) from
> (SELECT T0.plant_no,
> T0.part_chain,
> T0.part_new,
> T0.part_no
> FROM dm_ads_dims_prod.cloudera_test3 T0
> LEFT JOIN
> (SELECT T0.plant_no,
> T0.part_chain
> FROM
> (SELECT T0.plant_no,
> T0.part_chain,
> count( *) AS ct
> FROM dm_ads_dims_prod.cloudera_test3 T0
> WHERE purchase_pos = pos
> GROUP BY T0.plant_no,
> T0.part_chain) T0
> WHERE ct = 2 ) T1 ON T0.plant_no = T1.plant_no
> AND T0.part_chain = T1.part_chain
> WHERE T0.purchase_pos = T0.pos
> AND (T1.part_chain IS NULL
> OR (T1.part_chain IS NOT NULL
> AND T0.fd = 1)) ) s;
> {noformat}
> Run the query with the following settings on the repro cluster a few times
> {code:java}
> set hive.query.results.cache.enabled=false;
> set hive.compute.query.using.stats=false;
> set hive.auto.convert.join=true;
> {code}
> and the results was
> {code:java}
> 2682424
> 2682426
> 2682425{code}
>  
> Then turn off {{hive.auto.convert.join}}
> {code:java}
> set hive.query.results.cache.enabled=false;
> set hive.compute.query.using.stats=false;
> set hive.auto.convert.join=false;
> {code}
> and the result was always *2682420*
> Analyzing the plans with hive.auto.convert.join enabled vs disabled, the 
> difference is the type of join Map vs Merge.
> Additionally, vectorization also plays a role when turned off the result 
> became good:
> {code:java}
> SET hive.vectorized.execution.enabled=false;
> {code}
> It is also just a workaround and has negative impact on performance this 
> should help us narrow down where to find the cause of the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27772) UNIX_TIMESTAMP should return NULL when date fields are out of bounds

2023-10-16 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated HIVE-27772:
---
Description: 
For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is 
giving out the timestamp value as the last valid date, rather than NULL. (e.g. 
UNIX_TIMESTAMP('2001-02-31', '-MM-dd') gives 983354400, which converts to 
'2001-02-28'. However, for calendar days larger than 31, e.g. 2001-02-32, or 
2023-04-32, UNIX_TIMESTAMP() would give NULL as a result.

In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or 0).

 
{noformat}
6: jdbc:hive2://localhost:10001/> select month, datetimestamp, 
unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetimetable;
INFO  : Compiling 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
timestampCol from datetimetable
INFO  : No Stats for default@datetimetable, Columns: month, datetimestamp
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, 
type:string, comment:null), FieldSchema(name:datetimestamp, type:string, 
comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)], 
properties:null)
INFO  : Completed compiling 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time 
taken: 0.102 seconds
INFO  : Operation QUERY obtained 0 locks
INFO  : Executing 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
timestampCol from datetimetable
INFO  : Completed executing 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time 
taken: 0.0 seconds
+++---+
| month  | datetimestamp  | timestampcol  |
+++---+
| Feb    | 2001-02-28     | 983318400     |
| Feb    | 2001-02-29     | 983318400     |
| Feb    | 2001-02-30     | 983318400     |
| Feb    | 2001-02-31     | 983318400     |
| Feb    | 2001-02-32     | NULL          |
+++---+
5 rows selected (0.131 seconds){noformat}
 

 

 

According to java jdk : 
[https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/java/time/format/ResolverStyle.java#L103]

 
{noformat}
  /**
     * Style to resolve dates and times strictly.
     * 
     * Using strict resolution will ensure that all parsed values are within
     * the outer range of valid values for the field. Individual fields may
     * be further processed for strictness.
     * 
     * For example, resolving year-month and day-of-month in the ISO calendar
     * system using strict mode will ensure that the day-of-month is valid
     * for the year-month, rejecting invalid values.
     */
    STRICT,
    /**
     * Style to resolve dates and times in a smart, or intelligent, manner.
     * 
     * Using smart resolution will perform the sensible default for each
     * field, which may be the same as strict, the same as lenient, or a third
     * behavior. Individual fields will interpret this differently.
     * 
     * For example, resolving year-month and day-of-month in the ISO calendar
     * system using smart mode will ensure that the day-of-month is from
     * 1 to 31, converting any value beyond the last valid day-of-month to be
     * the last valid day-of-month.
     */
    SMART,{noformat}
 

 

By default, the DATETIME formatter uses the SMART resolution style and the 
SIMPLE formatter the LENIENT. Both of these styles are able to resolve 
"invalid" bounds to valid dates. In order to prevent seemingly "invalid" dates 
to be parsed correctly we have to use the STRICT resolution style. However, we 
cannot simply switch the formatters to always use the STRICT resolution cause 
that would break existing applications relying on the existing resolution 
rules. To address the problem reported here and retain the previous behaviour 
we opted to make the resolution style configurable by adding a new property. 
The new property only affects the DATETIME formatter; the SIMPLE formatter is 
almost deprecated so we don't add new features to it.





 

  was:
For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is 
giving out the timestamp value as the last valid date, rather than NULL. (e.g. 
UNIX_TIMESTAMP('2001-02-31', '-MM-dd') gives 983354400, which converts to 
'2001-02-28'. However, for calendar days larger than 31, e.g. 2001-02-32, or 
2023-04-32, UNIX_TIMESTAMP() would give NULL as a result.

In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or 0).

 
{noformat}
6: jdbc:hive2://localhost:10001/> select month, datetimestamp, 
unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetime

[jira] [Commented] (HIVE-27772) UNIX_TIMESTAMP should return NULL when date fields are out of bounds

2023-10-16 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775767#comment-17775767
 ] 

Simhadri Govindappa commented on HIVE-27772:


Thanks [~zabetak] for the review. 

I will update the wiki and the Jira description.

> UNIX_TIMESTAMP should return NULL when date fields are out of bounds
> 
>
> Key: HIVE-27772
> URL: https://issues.apache.org/jira/browse/HIVE-27772
> Project: Hive
>  Issue Type: Bug
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is 
> giving out the timestamp value as the last valid date, rather than NULL. 
> (e.g. UNIX_TIMESTAMP('2001-02-31', '-MM-dd') gives 983354400, which 
> converts to '2001-02-28'. However, for calendar days larger than 31, e.g. 
> 2001-02-32, or 2023-04-32, UNIX_TIMESTAMP() would give NULL as a result.
> In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or 
> 0).
>  
> {noformat}
> 6: jdbc:hive2://localhost:10001/> select month, datetimestamp, 
> unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from 
> datetimetable;
> INFO  : Compiling 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
> select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
> timestampCol from datetimetable
> INFO  : No Stats for default@datetimetable, Columns: month, datetimestamp
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, 
> type:string, comment:null), FieldSchema(name:datetimestamp, type:string, 
> comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)], 
> properties:null)
> INFO  : Completed compiling 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); 
> Time taken: 0.102 seconds
> INFO  : Operation QUERY obtained 0 locks
> INFO  : Executing 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
> select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
> timestampCol from datetimetable
> INFO  : Completed executing 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); 
> Time taken: 0.0 seconds
> +++---+
> | month  | datetimestamp  | timestampcol  |
> +++---+
> | Feb    | 2001-02-28     | 983318400     |
> | Feb    | 2001-02-29     | 983318400     |
> | Feb    | 2001-02-30     | 983318400     |
> | Feb    | 2001-02-31     | 983318400     |
> | Feb    | 2001-02-32     | NULL          |
> +++---+
> 5 rows selected (0.131 seconds){noformat}
>  
>  
> It looks like 
> [InstantDateTimeFormatter.java#L52|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/InstantDateTimeFormatter.java#L52]
>   by default, the formatter has the SMART resolver style.
> According to java jdk : 
> https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/java/time/format/ResolverStyle.java#L103
>  
> {noformat}
>   /**
>      * Style to resolve dates and times strictly.
>      * 
>      * Using strict resolution will ensure that all parsed values are within
>      * the outer range of valid values for the field. Individual fields may
>      * be further processed for strictness.
>      * 
>      * For example, resolving year-month and day-of-month in the ISO calendar
>      * system using strict mode will ensure that the day-of-month is valid
>      * for the year-month, rejecting invalid values.
>      */
>     STRICT,
>     /**
>      * Style to resolve dates and times in a smart, or intelligent, manner.
>      * 
>      * Using smart resolution will perform the sensible default for each
>      * field, which may be the same as strict, the same as lenient, or a third
>      * behavior. Individual fields will interpret this differently.
>      * 
>      * For example, resolving year-month and day-of-month in the ISO calendar
>      * system using smart mode will ensure that the day-of-month is from
>      * 1 to 31, converting any value beyond the last valid day-of-month to be
>      * the last valid day-of-month.
>      */
>     SMART,{noformat}
>  
>  
> Therefore, we should set the resolverStyle to STRICT to reject invalid date 
> values.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27772) Hive UNIX_TIMESTAMP()should return null for invalid dates

2023-10-05 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated HIVE-27772:
---
Summary: Hive UNIX_TIMESTAMP()should return null for invalid dates  (was: 
Hive UNIX_TIMESTAMP() not returning null for invalid dates)

> Hive UNIX_TIMESTAMP()should return null for invalid dates
> -
>
> Key: HIVE-27772
> URL: https://issues.apache.org/jira/browse/HIVE-27772
> Project: Hive
>  Issue Type: Bug
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>
> For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is 
> giving out the timestamp value as the last valid date, rather than NULL. 
> (e.g. UNIX_TIMESTAMP('2001-02-31', '-MM-dd') gives 983354400, which 
> converts to '2001-02-28'. However, for calendar days larger than 31, e.g. 
> 2001-02-32, or 2023-04-32, UNIX_TIMESTAMP() would give NULL as a result.
> In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or 
> 0).
>  
> {noformat}
> 6: jdbc:hive2://localhost:10001/> select month, datetimestamp, 
> unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from 
> datetimetable;
> INFO  : Compiling 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
> select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
> timestampCol from datetimetable
> INFO  : No Stats for default@datetimetable, Columns: month, datetimestamp
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, 
> type:string, comment:null), FieldSchema(name:datetimestamp, type:string, 
> comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)], 
> properties:null)
> INFO  : Completed compiling 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); 
> Time taken: 0.102 seconds
> INFO  : Operation QUERY obtained 0 locks
> INFO  : Executing 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
> select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
> timestampCol from datetimetable
> INFO  : Completed executing 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); 
> Time taken: 0.0 seconds
> +++---+
> | month  | datetimestamp  | timestampcol  |
> +++---+
> | Feb    | 2001-02-28     | 983318400     |
> | Feb    | 2001-02-29     | 983318400     |
> | Feb    | 2001-02-30     | 983318400     |
> | Feb    | 2001-02-31     | 983318400     |
> | Feb    | 2001-02-32     | NULL          |
> +++---+
> 5 rows selected (0.131 seconds){noformat}
>  
>  
> It looks like 
> [InstantDateTimeFormatter.java#L52|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/InstantDateTimeFormatter.java#L52]
>   by default, the formatter has the SMART resolver style.
> According to java jdk : 
> https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/java/time/format/ResolverStyle.java#L103
>  
> {noformat}
>   /**
>      * Style to resolve dates and times strictly.
>      * 
>      * Using strict resolution will ensure that all parsed values are within
>      * the outer range of valid values for the field. Individual fields may
>      * be further processed for strictness.
>      * 
>      * For example, resolving year-month and day-of-month in the ISO calendar
>      * system using strict mode will ensure that the day-of-month is valid
>      * for the year-month, rejecting invalid values.
>      */
>     STRICT,
>     /**
>      * Style to resolve dates and times in a smart, or intelligent, manner.
>      * 
>      * Using smart resolution will perform the sensible default for each
>      * field, which may be the same as strict, the same as lenient, or a third
>      * behavior. Individual fields will interpret this differently.
>      * 
>      * For example, resolving year-month and day-of-month in the ISO calendar
>      * system using smart mode will ensure that the day-of-month is from
>      * 1 to 31, converting any value beyond the last valid day-of-month to be
>      * the last valid day-of-month.
>      */
>     SMART,{noformat}
>  
>  
> Therefore, we should set the resolverStyle to STRICT to reject invalid date 
> values.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27772) Hive UNIX_TIMESTAMP() not returning null for invalid dates

2023-10-05 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated HIVE-27772:
---
Description: 
For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is 
giving out the timestamp value as the last valid date, rather than NULL. (e.g. 
UNIX_TIMESTAMP('2001-02-31', '-MM-dd') gives 983354400, which converts to 
'2001-02-28'. However, for calendar days larger than 31, e.g. 2001-02-32, or 
2023-04-32, UNIX_TIMESTAMP() would give NULL as a result.

In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or 0).

 
{noformat}
6: jdbc:hive2://localhost:10001/> select month, datetimestamp, 
unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetimetable;
INFO  : Compiling 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
timestampCol from datetimetable
INFO  : No Stats for default@datetimetable, Columns: month, datetimestamp
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, 
type:string, comment:null), FieldSchema(name:datetimestamp, type:string, 
comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)], 
properties:null)
INFO  : Completed compiling 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time 
taken: 0.102 seconds
INFO  : Operation QUERY obtained 0 locks
INFO  : Executing 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
timestampCol from datetimetable
INFO  : Completed executing 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time 
taken: 0.0 seconds
+++---+
| month  | datetimestamp  | timestampcol  |
+++---+
| Feb    | 2001-02-28     | 983318400     |
| Feb    | 2001-02-29     | 983318400     |
| Feb    | 2001-02-30     | 983318400     |
| Feb    | 2001-02-31     | 983318400     |
| Feb    | 2001-02-32     | NULL          |
+++---+
5 rows selected (0.131 seconds){noformat}
 

 

It looks like 
[InstantDateTimeFormatter.java#L52|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/InstantDateTimeFormatter.java#L52]
  by default, the formatter has the SMART resolver style.

According to java jdk : 
https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/java/time/format/ResolverStyle.java#L103

 
{noformat}
  /**
     * Style to resolve dates and times strictly.
     * 
     * Using strict resolution will ensure that all parsed values are within
     * the outer range of valid values for the field. Individual fields may
     * be further processed for strictness.
     * 
     * For example, resolving year-month and day-of-month in the ISO calendar
     * system using strict mode will ensure that the day-of-month is valid
     * for the year-month, rejecting invalid values.
     */
    STRICT,
    /**
     * Style to resolve dates and times in a smart, or intelligent, manner.
     * 
     * Using smart resolution will perform the sensible default for each
     * field, which may be the same as strict, the same as lenient, or a third
     * behavior. Individual fields will interpret this differently.
     * 
     * For example, resolving year-month and day-of-month in the ISO calendar
     * system using smart mode will ensure that the day-of-month is from
     * 1 to 31, converting any value beyond the last valid day-of-month to be
     * the last valid day-of-month.
     */
    SMART,{noformat}
 

 

Therefore, we should set the resolverStyle to STRICT to reject invalid date 
values.

 

  was:
For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is 
giving out the timestamp value as the last valid date, rather than NULL. (e.g. 
UNIX_TIMESTAMP('2001-02-31', '-MM-dd') gives 983354400, which converts to 
'2001-02-28'. However, for calendar days larger than 31, e.g. 2001-02-32, or 
2023-04-32, UNIX_TIMESTAMP() would give NULL as a result.

In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or 0).

 

 

 
{noformat}
6: jdbc:hive2://localhost:10001/> select month, datetimestamp, 
unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetimetable;
INFO  : Compiling 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
timestampCol from datetimetable
INFO  : No Stats for default@datetimetable, Columns: month, datetimestamp
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, 
type:string, commen

[jira] [Updated] (HIVE-27772) Hive UNIX_TIMESTAMP() not returning null for invalid dates

2023-10-05 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated HIVE-27772:
---
Description: 
For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is 
giving out the timestamp value as the last valid date, rather than NULL. (e.g. 
UNIX_TIMESTAMP('2001-02-31', '-MM-dd') gives 983354400, which converts to 
'2001-02-28'. However, for calendar days larger than 31, e.g. 2001-02-32, or 
2023-04-32, UNIX_TIMESTAMP() would give NULL as a result.

In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or 0).

 

 

 
{noformat}
6: jdbc:hive2://localhost:10001/> select month, datetimestamp, 
unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetimetable;
INFO  : Compiling 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
timestampCol from datetimetable
INFO  : No Stats for default@datetimetable, Columns: month, datetimestamp
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, 
type:string, comment:null), FieldSchema(name:datetimestamp, type:string, 
comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)], 
properties:null)
INFO  : Completed compiling 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time 
taken: 0.102 seconds
INFO  : Operation QUERY obtained 0 locks
INFO  : Executing 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
timestampCol from datetimetable
INFO  : Completed executing 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time 
taken: 0.0 seconds
+++---+
| month  | datetimestamp  | timestampcol  |
+++---+
| Feb    | 2001-02-28     | 983318400     |
| Feb    | 2001-02-29     | 983318400     |
| Feb    | 2001-02-30     | 983318400     |
| Feb    | 2001-02-31     | 983318400     |
| Feb    | 2001-02-32     | NULL          |
+++---+
5 rows selected (0.131 seconds){noformat}

  was:
{noformat}
6: jdbc:hive2://localhost:10001/> select month, datetimestamp, 
unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetimetable;
INFO  : Compiling 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
timestampCol from datetimetable
INFO  : No Stats for default@datetimetable, Columns: month, datetimestamp
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, 
type:string, comment:null), FieldSchema(name:datetimestamp, type:string, 
comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)], 
properties:null)
INFO  : Completed compiling 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time 
taken: 0.102 seconds
INFO  : Operation QUERY obtained 0 locks
INFO  : Executing 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
timestampCol from datetimetable
INFO  : Completed executing 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time 
taken: 0.0 seconds
+++---+
| month  | datetimestamp  | timestampcol  |
+++---+
| Feb    | 2001-02-28     | 983318400     |
| Feb    | 2001-02-29     | 983318400     |
| Feb    | 2001-02-30     | 983318400     |
| Feb    | 2001-02-31     | 983318400     |
| Feb    | 2001-02-32     | NULL          |
+++---+
5 rows selected (0.131 seconds){noformat}


> Hive UNIX_TIMESTAMP() not returning null for invalid dates
> --
>
> Key: HIVE-27772
> URL: https://issues.apache.org/jira/browse/HIVE-27772
> Project: Hive
>  Issue Type: Bug
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>
> For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is 
> giving out the timestamp value as the last valid date, rather than NULL. 
> (e.g. UNIX_TIMESTAMP('2001-02-31', '-MM-dd') gives 983354400, which 
> converts to '2001-02-28'. However, for calendar days larger than 31, e.g. 
> 2001-02-32, or 2023-04-32, UNIX_TIMESTAMP() would give NULL as a result.
> In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or 
> 0).
>  
>  
>  
> {noformat}
> 6: jdbc:hive2://localhost:10001/> select month, datetimestamp, 

[jira] [Updated] (HIVE-27772) Hive UNIX_TIMESTAMP() not returning null for invalid dates

2023-10-05 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated HIVE-27772:
---
Description: 
{noformat}
6: jdbc:hive2://localhost:10001/> select month, datetimestamp, 
unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetimetable;
INFO  : Compiling 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
timestampCol from datetimetable
INFO  : No Stats for default@datetimetable, Columns: month, datetimestamp
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, 
type:string, comment:null), FieldSchema(name:datetimestamp, type:string, 
comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)], 
properties:null)
INFO  : Completed compiling 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time 
taken: 0.102 seconds
INFO  : Operation QUERY obtained 0 locks
INFO  : Executing 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
timestampCol from datetimetable
INFO  : Completed executing 
command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time 
taken: 0.0 seconds
+++---+
| month  | datetimestamp  | timestampcol  |
+++---+
| Feb    | 2001-02-28     | 983318400     |
| Feb    | 2001-02-29     | 983318400     |
| Feb    | 2001-02-30     | 983318400     |
| Feb    | 2001-02-31     | 983318400     |
| Feb    | 2001-02-32     | NULL          |
+++---+
5 rows selected (0.131 seconds){noformat}

> Hive UNIX_TIMESTAMP() not returning null for invalid dates
> --
>
> Key: HIVE-27772
> URL: https://issues.apache.org/jira/browse/HIVE-27772
> Project: Hive
>  Issue Type: Bug
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>
> {noformat}
> 6: jdbc:hive2://localhost:10001/> select month, datetimestamp, 
> unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from 
> datetimetable;
> INFO  : Compiling 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
> select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
> timestampCol from datetimetable
> INFO  : No Stats for default@datetimetable, Columns: month, datetimestamp
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, 
> type:string, comment:null), FieldSchema(name:datetimestamp, type:string, 
> comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)], 
> properties:null)
> INFO  : Completed compiling 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); 
> Time taken: 0.102 seconds
> INFO  : Operation QUERY obtained 0 locks
> INFO  : Executing 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
> select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as 
> timestampCol from datetimetable
> INFO  : Completed executing 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); 
> Time taken: 0.0 seconds
> +++---+
> | month  | datetimestamp  | timestampcol  |
> +++---+
> | Feb    | 2001-02-28     | 983318400     |
> | Feb    | 2001-02-29     | 983318400     |
> | Feb    | 2001-02-30     | 983318400     |
> | Feb    | 2001-02-31     | 983318400     |
> | Feb    | 2001-02-32     | NULL          |
> +++---+
> 5 rows selected (0.131 seconds){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27772) Hive UNIX_TIMESTAMP() not returning null for invalid dates

2023-10-05 Thread Simhadri Govindappa (Jira)
Simhadri Govindappa created HIVE-27772:
--

 Summary: Hive UNIX_TIMESTAMP() not returning null for invalid dates
 Key: HIVE-27772
 URL: https://issues.apache.org/jira/browse/HIVE-27772
 Project: Hive
  Issue Type: Bug
Reporter: Simhadri Govindappa
Assignee: Simhadri Govindappa






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27754) Query Filter with OR condition updates every record in the table

2023-10-02 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771151#comment-17771151
 ] 

Simhadri Govindappa commented on HIVE-27754:


 
{quote}
{noformat}
set hive.cbo.fallback.strategy=NEVER;
{noformat}
Can be used to prevent running these statements.
see also:
[https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L687-L688]
{quote}
 

Here is a qfile which can repro the issue even with 
'hive.cbo.fallback.strategy=NEVER;'

[https://github.com/simhadri-g/hive/commit/9520fff464c9d1bf400e5e8f43b5f00bf9615825]

 

 
{quote}If the expression in the where clause has logical operators ({{{}OR{}}}, 
{{{}AND{}}}, ...) the operands are implicitly casted to boolean
[https://github.com/apache/hive/blob/85f6162becb8723ff6c9f85875048ced6ca7ae89/ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java#L842-L847]
{quote}
Will debug through this. 

 

Thanks!

> Query Filter with OR condition updates every record in the table
> 
>
> Key: HIVE-27754
> URL: https://issues.apache.org/jira/browse/HIVE-27754
> Project: Hive
>  Issue Type: Bug
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>
>  
> {noformat}
> UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 'Taylor' 
> ;{noformat}
>  After the above statement, all the records are updated. The condition 
> {{'Taylor'}} is a constant string, and it will always evaluate to true 
> because it's a non-empty string. So, effectively,  {{UPDATE}} statement is 
> updating all rows in the {{customers_man.}}
> {{}}
> {{Repro: }}
> {noformat}
> create  table customers_man (customer_id bigint, first_name string) 
> PARTITIONED BY (last_name string) STORED AS orc TBLPROPERTIES 
> ('transactional'='true');
>  insert into customers_man values(1, "Joanna", "Pierce"),(1, "Sharon", 
> "Taylor"), (2, "Joanna", "Silver"), (2, "Bob", "Silver"), (2, "Susan", 
> "Morrison") ,(2, "Jake", "Donnel") , (3, "Blake", "Burr"), (3, "Trudy", 
> "Johnson"), (3, "Trudy", "Henderson");
>  select * from customers_man;
>  
> ++---+--+
>  | customers_man.customer_id  | customers_man.first_name  | 
> customers_man.last_name  |
>  
> ++---+--+
>  | 3  | Blake | Burr  
>|
>  | 2  | Jake  | Donnel
>|
>  | 3  | Trudy | Henderson 
>|
>  | 3  | Trudy | Johnson   
>|
>  | 2  | Susan | Morrison  
>|
>  | 1  | Joanna| Pierce
>|
>  | 2  | Joanna| Silver
>|
>  | 2  | Bob   | Silver
>|
>  | 1  | Sharon| Taylor
>|
>  
> ++---+--+
>  UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 
> last_name='Taylor' ;
>  select * from customers_man;
>  
> ++---+--+
>  | customers_man.customer_id  | customers_man.first_name  | 
> customers_man.last_name  |
>  
> ++---+--+
>  | 3  | Blake | Burr  
>|
>  | 2  | Jake  | Donnel
>|
>  | 3  | Trudy | Henderson 
>|
>  | 3  | Trudy | Johnson   
>|
>  | 2  | Susan | Morrison  
>|
>  | 22 | Joanna| Pierce
>|
>  | 2  | Joanna| Silver
>|
>  | 2  | Bob   | Silver
>|
>  | 22 | Sharon| Taylor
>|
>  
> ++---+--+
>   UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 
> 'Taylor' ;
>   se

[jira] [Assigned] (HIVE-27754) Query Filter with OR condition updates every record in the table

2023-09-28 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa reassigned HIVE-27754:
--

Assignee: Simhadri Govindappa

> Query Filter with OR condition updates every record in the table
> 
>
> Key: HIVE-27754
> URL: https://issues.apache.org/jira/browse/HIVE-27754
> Project: Hive
>  Issue Type: Bug
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>
>  
> {noformat}
> UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 'Taylor' 
> ;{noformat}
>  After the above statement, all the records are updated. The condition 
> {{'Taylor'}} is a constant string, and it will always evaluate to true 
> because it's a non-empty string. So, effectively,  {{UPDATE}} statement is 
> updating all rows in the {{customers_man.}}
> {{}}
> {{Repro: }}
> {noformat}
> create  table customers_man (customer_id bigint, first_name string) 
> PARTITIONED BY (last_name string) STORED AS orc TBLPROPERTIES 
> ('transactional'='true');
>  insert into customers_man values(1, "Joanna", "Pierce"),(1, "Sharon", 
> "Taylor"), (2, "Joanna", "Silver"), (2, "Bob", "Silver"), (2, "Susan", 
> "Morrison") ,(2, "Jake", "Donnel") , (3, "Blake", "Burr"), (3, "Trudy", 
> "Johnson"), (3, "Trudy", "Henderson");
>  select * from customers_man;
>  
> ++---+--+
>  | customers_man.customer_id  | customers_man.first_name  | 
> customers_man.last_name  |
>  
> ++---+--+
>  | 3  | Blake | Burr  
>|
>  | 2  | Jake  | Donnel
>|
>  | 3  | Trudy | Henderson 
>|
>  | 3  | Trudy | Johnson   
>|
>  | 2  | Susan | Morrison  
>|
>  | 1  | Joanna| Pierce
>|
>  | 2  | Joanna| Silver
>|
>  | 2  | Bob   | Silver
>|
>  | 1  | Sharon| Taylor
>|
>  
> ++---+--+
>  UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 
> last_name='Taylor' ;
>  select * from customers_man;
>  
> ++---+--+
>  | customers_man.customer_id  | customers_man.first_name  | 
> customers_man.last_name  |
>  
> ++---+--+
>  | 3  | Blake | Burr  
>|
>  | 2  | Jake  | Donnel
>|
>  | 3  | Trudy | Henderson 
>|
>  | 3  | Trudy | Johnson   
>|
>  | 2  | Susan | Morrison  
>|
>  | 22 | Joanna| Pierce
>|
>  | 2  | Joanna| Silver
>|
>  | 2  | Bob   | Silver
>|
>  | 22 | Sharon| Taylor
>|
>  
> ++---+--+
>   UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 
> 'Taylor' ;
>   select * from customers_man;
>   
> ++---+--+
>   | customers_man.customer_id  | customers_man.first_name  | 
> customers_man.last_name  |
>   
> ++---+--+
>   | 22 | Blake | Burr 
> |
>   | 22 | Jake  | Donnel   
> |
>   | 22 | Trudy | Henderson
> |
>   | 22 | Trudy | Johnson  
> |
>   | 22 | Susan | Morrison 
> |
>   | 22 | Joanna| Pierce  

[jira] [Created] (HIVE-27754) Query Filter with OR condition updates every record in the table

2023-09-28 Thread Simhadri Govindappa (Jira)
Simhadri Govindappa created HIVE-27754:
--

 Summary: Query Filter with OR condition updates every record in 
the table
 Key: HIVE-27754
 URL: https://issues.apache.org/jira/browse/HIVE-27754
 Project: Hive
  Issue Type: Bug
Reporter: Simhadri Govindappa


 
{noformat}
UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 'Taylor' 
;{noformat}
 After the above statement, all the records are updated. The condition 
{{'Taylor'}} is a constant string, and it will always evaluate to true because 
it's a non-empty string. So, effectively,  {{UPDATE}} statement is updating all 
rows in the {{customers_man.}}
{{}}
{{Repro: }}


{noformat}
create  table customers_man (customer_id bigint, first_name string) PARTITIONED 
BY (last_name string) STORED AS orc TBLPROPERTIES ('transactional'='true');

 insert into customers_man values(1, "Joanna", "Pierce"),(1, "Sharon", 
"Taylor"), (2, "Joanna", "Silver"), (2, "Bob", "Silver"), (2, "Susan", 
"Morrison") ,(2, "Jake", "Donnel") , (3, "Blake", "Burr"), (3, "Trudy", 
"Johnson"), (3, "Trudy", "Henderson");
 select * from customers_man;
 
++---+--+
 | customers_man.customer_id  | customers_man.first_name  | 
customers_man.last_name  |
 
++---+--+
 | 3  | Blake | Burr
 |
 | 2  | Jake  | Donnel  
 |
 | 3  | Trudy | Henderson   
 |
 | 3  | Trudy | Johnson 
 |
 | 2  | Susan | Morrison
 |
 | 1  | Joanna| Pierce  
 |
 | 2  | Joanna| Silver  
 |
 | 2  | Bob   | Silver  
 |
 | 1  | Sharon| Taylor  
 |
 
++---+--+


 UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 
last_name='Taylor' ;
 select * from customers_man;
 
++---+--+
 | customers_man.customer_id  | customers_man.first_name  | 
customers_man.last_name  |
 
++---+--+
 | 3  | Blake | Burr
 |
 | 2  | Jake  | Donnel  
 |
 | 3  | Trudy | Henderson   
 |
 | 3  | Trudy | Johnson 
 |
 | 2  | Susan | Morrison
 |
 | 22 | Joanna| Pierce  
 |
 | 2  | Joanna| Silver  
 |
 | 2  | Bob   | Silver  
 |
 | 22 | Sharon| Taylor  
 |
 
++---+--+


  UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 'Taylor' ;
  select * from customers_man;
  
++---+--+
  | customers_man.customer_id  | customers_man.first_name  | 
customers_man.last_name  |
  
++---+--+
  | 22 | Blake | Burr   
  |
  | 22 | Jake  | Donnel 
  |
  | 22 | Trudy | Henderson  
  |
  | 22 | Trudy | Johnson
  |
  | 22 | Susan | Morrison   
  |
  | 22 | Joanna| Pierce 
  |
  | 22 | Joanna| Silver 
  |
  | 22 | Bob   | Silver 
  |
  | 22 | Sharon| Taylor 
  |
  
++---+--+

--- simpler repro
UPDATE customers_man SET customer_id=23 WHERE true;
select * from customers_man; 

+

[jira] [Commented] (HIVE-27646) Iceberg: Retry query when concurrent write queries fail due to conflicting writes

2023-09-22 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767972#comment-17767972
 ] 

Simhadri Govindappa commented on HIVE-27646:


updated the fix version to 4.0.0

> Iceberg: Retry query when concurrent write queries fail due to conflicting 
> writes
> -
>
> Key: HIVE-27646
> URL: https://issues.apache.org/jira/browse/HIVE-27646
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Assume two concurrent update queries- Query A and Query B , that have 
> overlapping updates.
> If Query A commits the data and delete files first, then Query B will fail 
> with validation failure due to conflicting writes. 
> In this case, Query B should invalidate the commit files that are already 
> generated and re-execute the full query on the latest snapshot.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27646) Iceberg: Retry query when concurrent write queries fail due to conflicting writes

2023-09-22 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated HIVE-27646:
---
Fix Version/s: 4.0.0

> Iceberg: Retry query when concurrent write queries fail due to conflicting 
> writes
> -
>
> Key: HIVE-27646
> URL: https://issues.apache.org/jira/browse/HIVE-27646
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Assume two concurrent update queries- Query A and Query B , that have 
> overlapping updates.
> If Query A commits the data and delete files first, then Query B will fail 
> with validation failure due to conflicting writes. 
> In this case, Query B should invalidate the commit files that are already 
> generated and re-execute the full query on the latest snapshot.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27646) Iceberg: Retry query when concurrent write queries fail due to conflicting writes

2023-09-22 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa resolved HIVE-27646.

Resolution: Fixed

Change merged to master. 

Thanks [~dkuzmenko]  and [@suenalaba |https://github.com/suenalaba] for the 
review! 

> Iceberg: Retry query when concurrent write queries fail due to conflicting 
> writes
> -
>
> Key: HIVE-27646
> URL: https://issues.apache.org/jira/browse/HIVE-27646
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>
> Assume two concurrent update queries- Query A and Query B , that have 
> overlapping updates.
> If Query A commits the data and delete files first, then Query B will fail 
> with validation failure due to conflicting writes. 
> In this case, Query B should invalidate the commit files that are already 
> generated and re-execute the full query on the latest snapshot.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27656) Upgrade jansi.version to 2.4.0

2023-09-14 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa resolved HIVE-27656.

Resolution: Fixed

> Upgrade jansi.version to 2.4.0 
> ---
>
> Key: HIVE-27656
> URL: https://issues.apache.org/jira/browse/HIVE-27656
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>
> [https://github.com/fusesource/jansi/blob/master/changelog.md]
> Arm64/aarch64 support is added in jansi version 2.4.0 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27656) Upgrade jansi.version to 2.4.0

2023-09-14 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765132#comment-17765132
 ] 

Simhadri Govindappa commented on HIVE-27656:


Change has been merged to master. 

Thanks, [~zabetak] , [~lvegh]  and [~ayushtkn]! 

> Upgrade jansi.version to 2.4.0 
> ---
>
> Key: HIVE-27656
> URL: https://issues.apache.org/jira/browse/HIVE-27656
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>
> [https://github.com/fusesource/jansi/blob/master/changelog.md]
> Arm64/aarch64 support is added in jansi version 2.4.0 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27567) Support building multi-platform images

2023-09-12 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa resolved HIVE-27567.

Resolution: Fixed

> Support building multi-platform images
> --
>
> Key: HIVE-27567
> URL: https://issues.apache.org/jira/browse/HIVE-27567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zhihua Deng
>Assignee: Simhadri Govindappa
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27567) Support building multi-platform images

2023-09-12 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764104#comment-17764104
 ] 

Simhadri Govindappa commented on HIVE-27567:


Fixed in  HIVE-27277 ,   from hive 4.00-beta-1 hive docker image supports both 
arm64 and amd64 platforms. https://hub.docker.com/r/apache/hive/tags

> Support building multi-platform images
> --
>
> Key: HIVE-27567
> URL: https://issues.apache.org/jira/browse/HIVE-27567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zhihua Deng
>Assignee: Simhadri Govindappa
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27656) Upgrade jansi.version to 2.4.0

2023-08-30 Thread Simhadri Govindappa (Jira)
Simhadri Govindappa created HIVE-27656:
--

 Summary: Upgrade jansi.version to 2.4.0 
 Key: HIVE-27656
 URL: https://issues.apache.org/jira/browse/HIVE-27656
 Project: Hive
  Issue Type: Improvement
Reporter: Simhadri Govindappa
Assignee: Simhadri Govindappa


[https://github.com/fusesource/jansi/blob/master/changelog.md]

Arm64/aarch64 support is added in jansi version 2.4.0 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27265) Ensure table properties are case-insensitive when translating hms property to iceberg property

2023-08-28 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa resolved HIVE-27265.

Resolution: Won't Fix

Table properties are case-sensitive. 

> Ensure table properties are case-insensitive when translating hms property to 
> iceberg property
> --
>
> Key: HIVE-27265
> URL: https://issues.apache.org/jira/browse/HIVE-27265
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>
>  
> In this example, the "format-version" case is modified to upper case and the 
> query fails.
>  
> {noformat}
> >>>CREATE EXTERNAL TABLE TBL5(ID INT, NAME STRING) PARTITIONED BY (DEPT 
> >>>STRING) STORED BY ICEBERG STORED AS PARQUET TBLPROPERTIES 
> >>>('format-version'='2');
> OK
> >>>insert into tbl5 values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> >>>(4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> OK
> >>>delete from tbl5 where name in ('one', 'four') or id = 22;
> OK{noformat}
>  
> {noformat}
> >>> CREATE EXTERNAL TABLE TBL6(ID INT, NAME STRING) PARTITIONED BY (DEPT 
> >>> STRING) STORED BY ICEBERG STORED AS PARQUET TBLPROPERTIES 
> >>> ('FORMAT-VERSION'='2');
> ok
> >>>insert into tbl6 values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> >>>(4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> ok
> >>>delete from tbl6 where name in ('one', 'four') or id = 22;
> Error: Error while compiling statement: FAILED: SemanticException [Error 
> 10297]: Attempt to do update or delete on table tbl6 that is not 
> transactional (state=42000,code=10297){noformat}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27653) Iceberg: Add conflictDetectionFilter to validate concurrently added data and delete files

2023-08-28 Thread Simhadri Govindappa (Jira)
Simhadri Govindappa created HIVE-27653:
--

 Summary: Iceberg: Add conflictDetectionFilter to validate 
concurrently added data and delete files
 Key: HIVE-27653
 URL: https://issues.apache.org/jira/browse/HIVE-27653
 Project: Hive
  Issue Type: Improvement
Reporter: Simhadri Govindappa
Assignee: Simhadri Govindappa






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27277) Set up github actions workflow to build and push docker image to docker hub

2023-08-25 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759033#comment-17759033
 ] 

Simhadri Govindappa commented on HIVE-27277:


Thanks [~ayushtkn] .

 
{noformat}
tags: ${{ secrets.DOCKERHUB_USER }}/hive:${{ env.tag }}{noformat}
This occurred because secrets.DOCKERHUB_USER should ideally be the repo name 
which is "apache" but it turns out its "afsjenkins"

 

> Set up github actions workflow to build and push docker image to docker hub
> ---
>
> Key: HIVE-27277
> URL: https://issues.apache.org/jira/browse/HIVE-27277
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27646) Iceberg: Retry query when concurrent write queries fail due to conflicting writes

2023-08-24 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated HIVE-27646:
---
Summary: Iceberg: Retry query when concurrent write queries fail due to 
conflicting writes  (was: Iceberg: Retry query when concurrent write queries 
fail due to conflicting write)

> Iceberg: Retry query when concurrent write queries fail due to conflicting 
> writes
> -
>
> Key: HIVE-27646
> URL: https://issues.apache.org/jira/browse/HIVE-27646
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>
> Assume two concurrent update queries- Query A and Query B , that have 
> overlapping updates.
> If Query A commits the data and delete files first, then Query B will fail 
> with validation failure due to conflicting writes. 
> In this case, Query B should invalidate the commit files that are already 
> generated and re-execute the full query on the latest snapshot.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27646) Iceberg: Retry query when concurrent write queries fail due to conflicting write

2023-08-24 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated HIVE-27646:
---
Description: 
Assume two concurrent update queries- Query A and Query B , that have 
overlapping updates.

If Query A commits the data and delete files first, then Query B will fail with 
validation failure due to conflicting writes. 

In this case, Query B should invalidate the commit files that are already 
generated and re-execute the full query on the latest snapshot.

  was:
During concurrent updates,

Assume 2 concurrent update queries- Query A and Query B that have insersecting 
updates

If Query A commits the data and delet

If any conflicting files are detected during the commit stage of the query that 
commits last,  we will have to re-execute the full query. 


> Iceberg: Retry query when concurrent write queries fail due to conflicting 
> write
> 
>
> Key: HIVE-27646
> URL: https://issues.apache.org/jira/browse/HIVE-27646
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>
> Assume two concurrent update queries- Query A and Query B , that have 
> overlapping updates.
> If Query A commits the data and delete files first, then Query B will fail 
> with validation failure due to conflicting writes. 
> In this case, Query B should invalidate the commit files that are already 
> generated and re-execute the full query on the latest snapshot.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27646) Iceberg: Retry query when concurrent write queries fail due to conflicting write

2023-08-24 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated HIVE-27646:
---
Description: 
During concurrent updates,

Assume 2 concurrent update queries- Query A and Query B that have insersecting 
updates

If Query A commits the data and delet

If any conflicting files are detected during the commit stage of the query that 
commits last,  we will have to re-execute the full query. 

  was:
During concurrent updates,

Assume 2 concurrent update quries- Query A

If any conflicting files are detected during the commit stage of the query that 
commits last,  we will have to re-execute the full query. 


> Iceberg: Retry query when concurrent write queries fail due to conflicting 
> write
> 
>
> Key: HIVE-27646
> URL: https://issues.apache.org/jira/browse/HIVE-27646
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>
> During concurrent updates,
> Assume 2 concurrent update queries- Query A and Query B that have 
> insersecting updates
> If Query A commits the data and delet
> If any conflicting files are detected during the commit stage of the query 
> that commits last,  we will have to re-execute the full query. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27646) Iceberg: Retry query when concurrent write queries fail due to conflicting write

2023-08-24 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated HIVE-27646:
---
Description: 
During concurrent updates,

Assume 2 concurrent update quries- Query A

If any conflicting files are detected during the commit stage of the query that 
commits last,  we will have to re-execute the full query. 

  was:
During concurrent updates,

If any conflicting files are detected during the commit stage of the query that 
commits last , we will have to re-execuete the full query. 


> Iceberg: Retry query when concurrent write queries fail due to conflicting 
> write
> 
>
> Key: HIVE-27646
> URL: https://issues.apache.org/jira/browse/HIVE-27646
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>
> During concurrent updates,
> Assume 2 concurrent update quries- Query A
> If any conflicting files are detected during the commit stage of the query 
> that commits last,  we will have to re-execute the full query. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27646) Iceberg: Retry query when concurrent write queries fail due to conflicting write

2023-08-24 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated HIVE-27646:
---
Description: 
During concurrent updates,

If any conflicting files are detected during the commit stage of the query that 
commits last , we will have to re-execuete the full query. 

> Iceberg: Retry query when concurrent write queries fail due to conflicting 
> write
> 
>
> Key: HIVE-27646
> URL: https://issues.apache.org/jira/browse/HIVE-27646
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>
> During concurrent updates,
> If any conflicting files are detected during the commit stage of the query 
> that commits last , we will have to re-execuete the full query. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27646) Iceberg: Retry query when concurrent write queries fail due to conflicting write

2023-08-24 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated HIVE-27646:
---
Summary: Iceberg: Retry query when concurrent write queries fail due to 
conflicting write  (was: Iceberg: Re-execute query when concurrent writes fail 
due to conflicting write)

> Iceberg: Retry query when concurrent write queries fail due to conflicting 
> write
> 
>
> Key: HIVE-27646
> URL: https://issues.apache.org/jira/browse/HIVE-27646
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27646) Iceberg: Re-execute query when concurrent writes fail due to conflicting write

2023-08-24 Thread Simhadri Govindappa (Jira)
Simhadri Govindappa created HIVE-27646:
--

 Summary: Iceberg: Re-execute query when concurrent writes fail due 
to conflicting write
 Key: HIVE-27646
 URL: https://issues.apache.org/jira/browse/HIVE-27646
 Project: Hive
  Issue Type: Improvement
Reporter: Simhadri Govindappa
Assignee: Simhadri Govindappa






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27589) Iceberg: Branches of Merge/Update statements should be committed atomically

2023-08-21 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17756936#comment-17756936
 ] 

Simhadri Govindappa commented on HIVE-27589:


Thanks [~dkuzmenko] , [~krisztiankasa]  and [~zhangbutao] :)

> Iceberg: Branches of Merge/Update statements should be committed atomically
> ---
>
> Key: HIVE-27589
> URL: https://issues.apache.org/jira/browse/HIVE-27589
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27589) Iceberg: Branches of Merge/Update statements should be committed atomically

2023-08-10 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa reassigned HIVE-27589:
--

Assignee: Simhadri Govindappa

> Iceberg: Branches of Merge/Update statements should be committed atomically
> ---
>
> Key: HIVE-27589
> URL: https://issues.apache.org/jira/browse/HIVE-27589
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Simhadri Govindappa
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27528) Hive iceberg: Alter table command should not call the metastore to update column stats

2023-07-25 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated HIVE-27528:
---
Description: 
l.

The bit vector that contains the ndv values is being overwritten by 
updatecolstats which is called during alter table command.

  was:It overwrites the previously calculated bit vectors for ndv as well.


> Hive iceberg:  Alter table command should not call the metastore to update 
> column stats
> ---
>
> Key: HIVE-27528
> URL: https://issues.apache.org/jira/browse/HIVE-27528
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>
> l.
> The bit vector that contains the ndv values is being overwritten by 
> updatecolstats which is called during alter table command.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27528) Hive iceberg: Alter table command should not call the metastore to update column stats

2023-07-25 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated HIVE-27528:
---
Description: 
 

 

The bit vector that contains the ndv values is being overwritten by 
updatecolstats which is called during alter table command.

  was:
l.

The bit vector that contains the ndv values is being overwritten by 
updatecolstats which is called during alter table command.


> Hive iceberg:  Alter table command should not call the metastore to update 
> column stats
> ---
>
> Key: HIVE-27528
> URL: https://issues.apache.org/jira/browse/HIVE-27528
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>
>  
>  
> The bit vector that contains the ndv values is being overwritten by 
> updatecolstats which is called during alter table command.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27528) Hive iceberg: Alter table command should not call the metastore to update column stats

2023-07-25 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated HIVE-27528:
---
Description: It overwrites the previously calculated bit vectors for ndv as 
well.

> Hive iceberg:  Alter table command should not call the metastore to update 
> column stats
> ---
>
> Key: HIVE-27528
> URL: https://issues.apache.org/jira/browse/HIVE-27528
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>
> It overwrites the previously calculated bit vectors for ndv as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27528) Hive iceberg: Alter table command should not call the metastore to update column stats

2023-07-25 Thread Simhadri Govindappa (Jira)
Simhadri Govindappa created HIVE-27528:
--

 Summary: Hive iceberg:  Alter table command should not call the 
metastore to update column stats
 Key: HIVE-27528
 URL: https://issues.apache.org/jira/browse/HIVE-27528
 Project: Hive
  Issue Type: Improvement
Reporter: Simhadri Govindappa
Assignee: Simhadri Govindappa






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27514) Patched-iceberg-core pom version contains an expression but should be a constant.

2023-07-19 Thread Simhadri Govindappa (Jira)
Simhadri Govindappa created HIVE-27514:
--

 Summary: Patched-iceberg-core pom  version contains an expression 
but should be a constant.
 Key: HIVE-27514
 URL: https://issues.apache.org/jira/browse/HIVE-27514
 Project: Hive
  Issue Type: Improvement
Reporter: Simhadri Govindappa
Assignee: Simhadri Govindappa


When building iceberg module maven throws the following warning :
{noformat}
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hive:patched-iceberg-api:jar:patched-1.3.0-4.0.0-beta-1-SNAPSHOT
[WARNING] 'version' contains an expression but should be a constant. @ 
org.apache.hive:patched-iceberg-api:patched-${iceberg.version}-${project.parent.version},
 
/Users/simhadri.govindappa/Documents/apache/hive/iceberg/patched-iceberg-api/pom.xml,
 line 12, column 12
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hive:patched-iceberg-core:jar:patched-1.3.0-4.0.0-beta-1-SNAPSHOT
[WARNING] 'version' contains an expression but should be a constant. @ 
org.apache.hive:patched-iceberg-core:patched-${iceberg.version}-${project.parent.version},
 
/Users/simhadri.govindappa/Documents/apache/hive/iceberg/patched-iceberg-core/pom.xml,
 line 12, column 12
[WARNING] 
[WARNING] It is highly recommended to fix these problems because they threaten 
the stability of your build.
[WARNING] 
[WARNING] For this reason, future Maven versions might no longer support 
building such malformed projects.
[WARNING] {noformat}

 Future Maven versions might no longer support building such malformed projects.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27448) Hive Iceberg: Merge column stats

2023-06-17 Thread Simhadri Govindappa (Jira)
Simhadri Govindappa created HIVE-27448:
--

 Summary: Hive Iceberg: Merge column stats 
 Key: HIVE-27448
 URL: https://issues.apache.org/jira/browse/HIVE-27448
 Project: Hive
  Issue Type: Improvement
Reporter: Simhadri Govindappa
Assignee: Simhadri Govindappa






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27433) Hive-site: Add redirect to blogs

2023-06-12 Thread Simhadri Govindappa (Jira)
Simhadri Govindappa created HIVE-27433:
--

 Summary: Hive-site: Add redirect to blogs
 Key: HIVE-27433
 URL: https://issues.apache.org/jira/browse/HIVE-27433
 Project: Hive
  Issue Type: Improvement
  Components: Website
Reporter: Simhadri Govindappa
Assignee: Simhadri Govindappa






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27356) Hive should write name of blob type instead of table name in Puffin

2023-05-18 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa reassigned HIVE-27356:
--

Assignee: Simhadri Govindappa

> Hive should write name of blob type instead of table name in Puffin
> ---
>
> Key: HIVE-27356
> URL: https://issues.apache.org/jira/browse/HIVE-27356
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Simhadri Govindappa
>Priority: Major
>
> Currently Hive writes the name of the table plus snapshot id as blob type:
> [https://github.com/apache/hive/blob/aa1e067033ef0b5468f725cfd3776810800af96d/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L422]
> Instead, it should write the name of the blog it writes. Table name and 
> snapshot id are redundant information anyway, as they can be inferred from 
> the location and filename of the puffin file.
> Currently it writes a non-standard blob (Standard blob types are listed 
> [here|https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/puffin/StandardBlobTypes.java]).
>  I think it would be better to write standard blobs for interoperability. But 
> if Hive wants to write non-standard blobs anyway, it should still come up 
> with a descriptive name for them, e.g. 'hive-column-statistics-v1'.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-27356) Hive should write name of blob type instead of table name in Puffin

2023-05-18 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17723889#comment-17723889
 ] 

Simhadri Govindappa edited comment on HIVE-27356 at 5/18/23 10:56 AM:
--

Sure.

{quote}
Currently it writes a non-standard blob (Standard blob types are listed 
[here|https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/puffin/StandardBlobTypes.java]).
 I think it would be better to write standard blobs for interoperability. But 
if Hive wants to write non-standard blobs anyway, it should still come up with 
a descriptive name for them, e.g. 'hive-column-statistics-v1'.
{quote}

The initial design we went with col stats object . We can easily change this to 
a different blob type.  


was (Author: simhadri-g):
Sure

> Hive should write name of blob type instead of table name in Puffin
> ---
>
> Key: HIVE-27356
> URL: https://issues.apache.org/jira/browse/HIVE-27356
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>
> Currently Hive writes the name of the table plus snapshot id as blob type:
> [https://github.com/apache/hive/blob/aa1e067033ef0b5468f725cfd3776810800af96d/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L422]
> Instead, it should write the name of the blog it writes. Table name and 
> snapshot id are redundant information anyway, as they can be inferred from 
> the location and filename of the puffin file.
> Currently it writes a non-standard blob (Standard blob types are listed 
> [here|https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/puffin/StandardBlobTypes.java]).
>  I think it would be better to write standard blobs for interoperability. But 
> if Hive wants to write non-standard blobs anyway, it should still come up 
> with a descriptive name for them, e.g. 'hive-column-statistics-v1'.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27356) Hive should write name of blob type instead of table name in Puffin

2023-05-18 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17723889#comment-17723889
 ] 

Simhadri Govindappa commented on HIVE-27356:


Sure

> Hive should write name of blob type instead of table name in Puffin
> ---
>
> Key: HIVE-27356
> URL: https://issues.apache.org/jira/browse/HIVE-27356
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>
> Currently Hive writes the name of the table plus snapshot id as blob type:
> [https://github.com/apache/hive/blob/aa1e067033ef0b5468f725cfd3776810800af96d/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L422]
> Instead, it should write the name of the blog it writes. Table name and 
> snapshot id are redundant information anyway, as they can be inferred from 
> the location and filename of the puffin file.
> Currently it writes a non-standard blob (Standard blob types are listed 
> [here|https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/puffin/StandardBlobTypes.java]).
>  I think it would be better to write standard blobs for interoperability. But 
> if Hive wants to write non-standard blobs anyway, it should still come up 
> with a descriptive name for them, e.g. 'hive-column-statistics-v1'.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27329) Document usage of the image

2023-05-12 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722236#comment-17722236
 ] 

Simhadri Govindappa commented on HIVE-27329:


Got it updated it in the offical docker hub.
[https://hub.docker.com/r/apache/hive] 

 

I am also updating Hive website to include this:
[https://github.com/apache/hive-site/pull/5] 

> Document usage of the image
> ---
>
> Key: HIVE-27329
> URL: https://issues.apache.org/jira/browse/HIVE-27329
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zhihua Deng
>Assignee: Simhadri Govindappa
>Priority: Major
>
> After we pushed the image to docker hub, it would be good to update 
> https://cwiki.apache.org/confluence/display/Hive/GettingStarted for using the 
> image.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27329) Document usage of the image

2023-05-12 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa reassigned HIVE-27329:
--

Assignee: Simhadri Govindappa

> Document usage of the image
> ---
>
> Key: HIVE-27329
> URL: https://issues.apache.org/jira/browse/HIVE-27329
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zhihua Deng
>Assignee: Simhadri Govindappa
>Priority: Major
>
> After we pushed the image to docker hub, it would be good to update 
> https://cwiki.apache.org/confluence/display/Hive/GettingStarted for using the 
> image.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27339) Hive Website: Add links to the new hive dockerhub

2023-05-12 Thread Simhadri Govindappa (Jira)
Simhadri Govindappa created HIVE-27339:
--

 Summary: Hive Website: Add links to the new hive dockerhub
 Key: HIVE-27339
 URL: https://issues.apache.org/jira/browse/HIVE-27339
 Project: Hive
  Issue Type: Improvement
  Components: Website
Reporter: Simhadri Govindappa
Assignee: Simhadri Govindappa






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27329) Document usage of the image

2023-05-11 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17721689#comment-17721689
 ] 

Simhadri Govindappa commented on HIVE-27329:


Sure

> Document usage of the image
> ---
>
> Key: HIVE-27329
> URL: https://issues.apache.org/jira/browse/HIVE-27329
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zhihua Deng
>Priority: Major
>
> After we pushed the image to docker hub, it would be good to update 
> https://cwiki.apache.org/confluence/display/Hive/GettingStarted for using the 
> image.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27327) Iceberg basic stats: Incorrect row count in snapshot summary leading to unoptimized plans

2023-05-08 Thread Simhadri Govindappa (Jira)
Simhadri Govindappa created HIVE-27327:
--

 Summary: Iceberg basic stats: Incorrect row count in snapshot 
summary leading to unoptimized plans
 Key: HIVE-27327
 URL: https://issues.apache.org/jira/browse/HIVE-27327
 Project: Hive
  Issue Type: Bug
Reporter: Simhadri Govindappa
Assignee: Simhadri Govindappa


In the absence of equality deletes, the total row count should be :
{noformat}
row_count = total-records - total-position-deletes{noformat}
 

 

Example:

After many inserts and deletes, there are only 46 records in a table.
{noformat}
>>select count(*) from llap_orders;
+--+
| _c0  |
+--+
| 46   |
+--+
1 row selected (7.22 seconds)

{noformat}
 

But the total records in snapshot summary indicate that there are 300 records

 
{noformat}
 {
    "sequence-number" : 19,
    "snapshot-id" : 4237525869561629328,
    "parent-snapshot-id" : 2572487769557272977,
    "timestamp-ms" : 1683553017982,
    "summary" : {
      "operation" : "append",
      "added-data-files" : "5",
      "added-records" : "12",
      "added-files-size" : "3613",
      "changed-partition-count" : "5",
      "total-records" : "300",
      "total-files-size" : "164405",
      "total-data-files" : "100",
      "total-delete-files" : "73",
      "total-position-deletes" : "254",
      "total-equality-deletes" : "0"
    }{noformat}
 

As a result of this , the hive plans generated are unoptimized.
{noformat}
0: jdbc:hive2://simhadrigovindappa-2.simhadri> explain update llap_orders set 
itemid=7 where itemid=5;

INFO  : OK
++
|                      Explain                       |
++
| Vertex dependency in root stage                    |
| Reducer 2 <- Map 1 (SIMPLE_EDGE)                   |
| Reducer 3 <- Map 1 (SIMPLE_EDGE)                   |
|                                                    |
| Stage-4                                            |
|   Stats Work{}                                     |
|     Stage-0                                        |
|       Move Operator                                |
|         table:{"name:":"db.llap_orders"}           |
|         Stage-3                                    |
|           Dependency Collection{}                  |
|             Stage-2                                |
|               Reducer 2 vectorized                 |
|               File Output Operator [FS_14]         |
|                 table:{"name:":"db.llap_orders"}   |
|                 Select Operator [SEL_13] (rows=150 width=424) |
|                   
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"]
 |
|                 <-Map 1 [SIMPLE_EDGE]              |
|                   SHUFFLE [RS_4]                   |
|                     Select Operator [SEL_3] (rows=150 width=424) |
|                       
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col7","_col8","_col9"]
 |
|                       Select Operator [SEL_2] (rows=150 width=644) |
|                         
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col7","_col8","_col9","_col10","_col11","_col13","_col14","_col15"]
 |
|                         Filter Operator [FIL_9] (rows=150 width=220) |
|                           predicate:(itemid = 5)   |
|                           TableScan [TS_0] (rows=300 width=220) |
|                             
db@llap_orders,llap_orders,Tbl:COMPLETE,Col:COMPLETE,Output:["orderid","quantity","itemid","tradets","p1","p2"]
 |
|               Reducer 3 vectorized                 |
|               File Output Operator [FS_16]         |
|                 table:{"name:":"db.llap_orders"}   |
|                 Select Operator [SEL_15]           |
|                   
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col4","_col5"] |
|                 <-Map 1 [SIMPLE_EDGE]              |
|                   SHUFFLE [RS_10]                  |
|                     PartitionCols:_col4, _col5     |
|                     Select Operator [SEL_7] (rows=150 width=220) |
|                       
Output:["_col0","_col1","_col2","_col3","_col4","_col5"] |
|                        Please refer to the previous Select Operator [SEL_2] |
|                                                    |
++
39 rows selected (0.104 seconds)
0: jdbc:hive2://simhadrigovindappa-2.simhadri>{noformat}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-23394) TestJdbcGenericUDTFGetSplits2#testGenericUDTFOrderBySplitCount1 is flaky

2023-05-08 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa resolved HIVE-23394.

Resolution: Fixed

> TestJdbcGenericUDTFGetSplits2#testGenericUDTFOrderBySplitCount1 is flaky
> 
>
> Key: HIVE-23394
> URL: https://issues.apache.org/jira/browse/HIVE-23394
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> both 
> TestJdbcGenericUDTFGetSplits2.testGenericUDTFOrderBySplitCount1 and
> TestJdbcGenericUDTFGetSplits.testGenericUDTFOrderBySplitCount1
> can fail with the exception below
> seems like the connection was lost
> {code}
> Error Message
> Failed to close statement
> Stacktrace
> java.sql.SQLException: Failed to close statement
>   at 
> org.apache.hive.jdbc.HiveStatement.closeStatementIfNeeded(HiveStatement.java:200)
>   at 
> org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:205)
>   at org.apache.hive.jdbc.HiveStatement.close(HiveStatement.java:222)
>   at 
> org.apache.hive.jdbc.AbstractTestJdbcGenericUDTFGetSplits.runQuery(AbstractTestJdbcGenericUDTFGetSplits.java:135)
>   at 
> org.apache.hive.jdbc.AbstractTestJdbcGenericUDTFGetSplits.testGenericUDTFOrderBySplitCount1(AbstractTestJdbcGenericUDTFGetSplits.java:164)
>   at 
> org.apache.hive.jdbc.TestJdbcGenericUDTFGetSplits2.testGenericUDTFOrderBySplitCount1(TestJdbcGenericUDTFGetSplits2.java:28)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> Caused by: org.apache.thrift.TApplicationException: CloseOperation failed: 
> out of sequence response
>   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:84)
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_CloseOperation(TCLIService.java:521)
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Client.CloseOperation(TCLIService.java:508)
>   at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1732)
>   at com.sun.proxy.$Proxy146.CloseOperation(Unknown Source)
>   at 
> org.apache.hive.jdbc.HiveStatement.closeStatementIfNeeded(HiveStatement.java:193)
>   ... 14 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-23394) TestJdbcGenericUDTFGetSplits2#testGenericUDTFOrderBySplitCount1 is flaky

2023-05-08 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17720466#comment-17720466
 ] 

Simhadri Govindappa commented on HIVE-23394:


The change is merged to master. 



Thanks, [~dkuzmenko] ,[~ayushtkn]  for the review!

> TestJdbcGenericUDTFGetSplits2#testGenericUDTFOrderBySplitCount1 is flaky
> 
>
> Key: HIVE-23394
> URL: https://issues.apache.org/jira/browse/HIVE-23394
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> both 
> TestJdbcGenericUDTFGetSplits2.testGenericUDTFOrderBySplitCount1 and
> TestJdbcGenericUDTFGetSplits.testGenericUDTFOrderBySplitCount1
> can fail with the exception below
> seems like the connection was lost
> {code}
> Error Message
> Failed to close statement
> Stacktrace
> java.sql.SQLException: Failed to close statement
>   at 
> org.apache.hive.jdbc.HiveStatement.closeStatementIfNeeded(HiveStatement.java:200)
>   at 
> org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:205)
>   at org.apache.hive.jdbc.HiveStatement.close(HiveStatement.java:222)
>   at 
> org.apache.hive.jdbc.AbstractTestJdbcGenericUDTFGetSplits.runQuery(AbstractTestJdbcGenericUDTFGetSplits.java:135)
>   at 
> org.apache.hive.jdbc.AbstractTestJdbcGenericUDTFGetSplits.testGenericUDTFOrderBySplitCount1(AbstractTestJdbcGenericUDTFGetSplits.java:164)
>   at 
> org.apache.hive.jdbc.TestJdbcGenericUDTFGetSplits2.testGenericUDTFOrderBySplitCount1(TestJdbcGenericUDTFGetSplits2.java:28)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> Caused by: org.apache.thrift.TApplicationException: CloseOperation failed: 
> out of sequence response
>   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:84)
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_CloseOperation(TCLIService.java:521)
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Client.CloseOperation(TCLIService.java:508)
>   at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1732)
>   at com.sun.proxy.$Proxy146.CloseOperation(Unknown Source)
>   at 
> org.apache.hive.jdbc.HiveStatement.closeStatementIfNeeded(HiveStatement.java:193)
>   ... 14 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-27277) Set up github actions workflow to build and push docker image to docker hub

2023-04-25 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17716378#comment-17716378
 ] 

Simhadri Govindappa edited comment on HIVE-27277 at 4/25/23 6:04 PM:
-

 

INFRA-24505 : Docker Repo created for apache hive: 
[https://hub.docker.com/r/apache/hive]

 


was (Author: simhadri-g):
Docker Repo created for apache hive: [https://hub.docker.com/r/apache/hive]

 

> Set up github actions workflow to build and push docker image to docker hub
> ---
>
> Key: HIVE-27277
> URL: https://issues.apache.org/jira/browse/HIVE-27277
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   >