[jira] [Commented] (HIVE-28524) Iceberg: Major QB Compaction add sort order support
[ https://issues.apache.org/jira/browse/HIVE-28524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17888238#comment-17888238 ] Simhadri Govindappa commented on HIVE-28524: Change merged to master. Thanks [~difin] for the PR! > Iceberg: Major QB Compaction add sort order support > --- > > Key: HIVE-28524 > URL: https://issues.apache.org/jira/browse/HIVE-28524 > Project: Hive > Issue Type: Improvement > Security Level: Public(Viewable by anyone) > Components: Hive >Reporter: Dmitriy Fingerman >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28561) Upgrade Hive 4.0
Simhadri Govindappa created HIVE-28561: -- Summary: Upgrade Hive 4.0 Key: HIVE-28561 URL: https://issues.apache.org/jira/browse/HIVE-28561 Project: Hive Issue Type: Improvement Security Level: Public (Viewable by anyone) Reporter: Simhadri Govindappa Assignee: Simhadri Govindappa Hive 4.0 has been released, we would like to upgrade the version of hive used in ranger to 4.0. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28373) Iceberg: Refactor the code of HadoopTableOptions
[ https://issues.apache.org/jira/browse/HIVE-28373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa updated HIVE-28373: --- Fix Version/s: 4.1.0 Resolution: Fixed Status: Resolved (was: Patch Available) > Iceberg: Refactor the code of HadoopTableOptions > > > Key: HIVE-28373 > URL: https://issues.apache.org/jira/browse/HIVE-28373 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0 >Reporter: yongzhi.shao >Assignee: yongzhi.shao >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > > Since there are a lot of problems with hadoop_catalog, we submitted the > following PR to the iceberg community: > [core:Refactor the code of HadoopTableOptions by BsoBird · Pull Request > #10623 · apache/iceberg > (github.com)|https://github.com/apache/iceberg/pull/10623] > With this PR, we can implement atomic operations based on hadoopcatalog. > But this PR is not accepted by the iceberg community.And it seems that the > iceberg community is trying to remove support for hadoopcatalog(only for > testing). > Since hive itself supports a number of features based on the hadoop_catalog > table, can we merge this patch in hive? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28373) Fix HadoopCatalog based table
[ https://issues.apache.org/jira/browse/HIVE-28373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880929#comment-17880929 ] Simhadri Govindappa commented on HIVE-28373: Change has been merged to master. Thanks [~lisoda] for the PR!! Thanks [~dkuzmenko] , [~ayushtkn] for the review!! > Fix HadoopCatalog based table > - > > Key: HIVE-28373 > URL: https://issues.apache.org/jira/browse/HIVE-28373 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0 >Reporter: yongzhi.shao >Assignee: yongzhi.shao >Priority: Major > Labels: pull-request-available > > Since there are a lot of problems with hadoop_catalog, we submitted the > following PR to the iceberg community: > [core:Refactor the code of HadoopTableOptions by BsoBird · Pull Request > #10623 · apache/iceberg > (github.com)|https://github.com/apache/iceberg/pull/10623] > With this PR, we can implement atomic operations based on hadoopcatalog. > But this PR is not accepted by the iceberg community.And it seems that the > iceberg community is trying to remove support for hadoopcatalog(only for > testing). > Since hive itself supports a number of features based on the hadoop_catalog > table, can we merge this patch in hive? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28373) Iceberg: Refactor the code of HadoopTableOptions
[ https://issues.apache.org/jira/browse/HIVE-28373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa updated HIVE-28373: --- Summary: Iceberg: Refactor the code of HadoopTableOptions (was: Fix HadoopCatalog based table) > Iceberg: Refactor the code of HadoopTableOptions > > > Key: HIVE-28373 > URL: https://issues.apache.org/jira/browse/HIVE-28373 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0 >Reporter: yongzhi.shao >Assignee: yongzhi.shao >Priority: Major > Labels: pull-request-available > > Since there are a lot of problems with hadoop_catalog, we submitted the > following PR to the iceberg community: > [core:Refactor the code of HadoopTableOptions by BsoBird · Pull Request > #10623 · apache/iceberg > (github.com)|https://github.com/apache/iceberg/pull/10623] > With this PR, we can implement atomic operations based on hadoopcatalog. > But this PR is not accepted by the iceberg community.And it seems that the > iceberg community is trying to remove support for hadoopcatalog(only for > testing). > Since hive itself supports a number of features based on the hadoop_catalog > table, can we merge this patch in hive? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-28303) Capture build scans on ge.apache.org to benefit from deep build insights
[ https://issues.apache.org/jira/browse/HIVE-28303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa reassigned HIVE-28303: -- Assignee: Simhadri Govindappa > Capture build scans on ge.apache.org to benefit from deep build insights > > > Key: HIVE-28303 > URL: https://issues.apache.org/jira/browse/HIVE-28303 > Project: Hive > Issue Type: Improvement > Components: Build Infrastructure >Reporter: Gasper Kojek >Assignee: Simhadri Govindappa >Priority: Minor > Labels: pull-request-available > > This improvement will enhance the functionality of the Hive build by > publishing build scans to [ge.apache.org|https://ge.apache.org/], hosted by > the Apache Software Foundation and run in partnership between the ASF and > Gradle. This Develocity instance has all features and extensions enabled and > is freely available for use by the Apache Hive project and all other Apache > projects. > On this Develocity instance, Apache Hive will have access not only to all of > the published build scans but other aggregate data features such as: > * Dashboards to view all historical build scans, along with performance > trends over time > * Build failure analytics for enhanced investigation and diagnosis of build > failures > * Test failure analytics to better understand trends and causes around slow, > failing, and flaky tests -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-28303) Capture build scans on ge.apache.org to benefit from deep build insights
[ https://issues.apache.org/jira/browse/HIVE-28303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa reassigned HIVE-28303: -- Assignee: Gasper Kojek (was: Simhadri Govindappa) > Capture build scans on ge.apache.org to benefit from deep build insights > > > Key: HIVE-28303 > URL: https://issues.apache.org/jira/browse/HIVE-28303 > Project: Hive > Issue Type: Improvement > Components: Build Infrastructure >Reporter: Gasper Kojek >Assignee: Gasper Kojek >Priority: Minor > Labels: pull-request-available > > This improvement will enhance the functionality of the Hive build by > publishing build scans to [ge.apache.org|https://ge.apache.org/], hosted by > the Apache Software Foundation and run in partnership between the ASF and > Gradle. This Develocity instance has all features and extensions enabled and > is freely available for use by the Apache Hive project and all other Apache > projects. > On this Develocity instance, Apache Hive will have access not only to all of > the published build scans but other aggregate data features such as: > * Dashboards to view all historical build scans, along with performance > trends over time > * Build failure analytics for enhanced investigation and diagnosis of build > failures > * Test failure analytics to better understand trends and causes around slow, > failing, and flaky tests -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-28303) Capture build scans on ge.apache.org to benefit from deep build insights
[ https://issues.apache.org/jira/browse/HIVE-28303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa resolved HIVE-28303. Fix Version/s: 4.1.0 Resolution: Fixed > Capture build scans on ge.apache.org to benefit from deep build insights > > > Key: HIVE-28303 > URL: https://issues.apache.org/jira/browse/HIVE-28303 > Project: Hive > Issue Type: Improvement > Components: Build Infrastructure >Reporter: Gasper Kojek >Assignee: Gasper Kojek >Priority: Minor > Labels: pull-request-available > Fix For: 4.1.0 > > > This improvement will enhance the functionality of the Hive build by > publishing build scans to [ge.apache.org|https://ge.apache.org/], hosted by > the Apache Software Foundation and run in partnership between the ASF and > Gradle. This Develocity instance has all features and extensions enabled and > is freely available for use by the Apache Hive project and all other Apache > projects. > On this Develocity instance, Apache Hive will have access not only to all of > the published build scans but other aggregate data features such as: > * Dashboards to view all historical build scans, along with performance > trends over time > * Build failure analytics for enhanced investigation and diagnosis of build > failures > * Test failure analytics to better understand trends and causes around slow, > failing, and flaky tests -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28303) Capture build scans on ge.apache.org to benefit from deep build insights
[ https://issues.apache.org/jira/browse/HIVE-28303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856292#comment-17856292 ] Simhadri Govindappa commented on HIVE-28303: Change has been merged to master. Thanks [~gkojek] for the PR. > Capture build scans on ge.apache.org to benefit from deep build insights > > > Key: HIVE-28303 > URL: https://issues.apache.org/jira/browse/HIVE-28303 > Project: Hive > Issue Type: Improvement > Components: Build Infrastructure >Reporter: Gasper Kojek >Assignee: Gasper Kojek >Priority: Minor > Labels: pull-request-available > > This improvement will enhance the functionality of the Hive build by > publishing build scans to [ge.apache.org|https://ge.apache.org/], hosted by > the Apache Software Foundation and run in partnership between the ASF and > Gradle. This Develocity instance has all features and extensions enabled and > is freely available for use by the Apache Hive project and all other Apache > projects. > On this Develocity instance, Apache Hive will have access not only to all of > the published build scans but other aggregate data features such as: > * Dashboards to view all historical build scans, along with performance > trends over time > * Build failure analytics for enhanced investigation and diagnosis of build > failures > * Test failure analytics to better understand trends and causes around slow, > failing, and flaky tests -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception
[ https://issues.apache.org/jira/browse/HIVE-28249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846837#comment-17846837 ] Simhadri Govindappa commented on HIVE-28249: Thanks, [~dkuzmenko] and [~zabetak] for the review and all the help :) Change is merged to master. It looks like the jodd authors have acknowledged it as a bug: [https://github.com/oblac/jodd-util/issues/21] . > Parquet legacy timezone conversion converts march 1st to 29th feb and fails > with not a leap year exception > -- > > Key: HIVE-28249 > URL: https://issues.apache.org/jira/browse/HIVE-28249 > Project: Hive > Issue Type: Task >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > > When handling legacy time stamp conversions in parquet,'February 29' year > '200' is an edge case. > This is because, according to this: [https://www.lanl.gov/Caesar/node202.html] > The Julian day for 200 CE/02/29 in the Julian calendar is different from the > Julian day in Gregorian Calendar . > ||Date (BC/AD)||Date (CE)||Julian Day||Julian Day|| > |-| -|(Julian Calendar)|(Gregorian Calendar)| > |200 AD/02/28|200 CE/02/28|1794166|1794167| > |200 AD/02/29|200 CE/02/29|1794167|1794168| > |200 AD/03/01|200 CE/03/01|1794168|1794168| > |300 AD/02/28|300 CE/02/28|1830691|1830691| > |300 AD/02/29|300 CE/02/29|1830692|1830692| > |300 AD/03/01|300 CE/03/01|1830693|1830692| > > * Because of this: > {noformat} > int julianDay = nt.getJulianDay(); {noformat} > returns julian day 1794167 > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java#L92] > * Later : > {noformat} > Timestamp result = Timestamp.valueOf(formatter.format(date)); {noformat} > _{{{}formatter.format(date{}}})_ returns 29-02-200 as it seems to be using > julian calendar > but _{{Timestamp.valueOf(29-02-200)}}_ seems to be using gregorian calendar > and fails with "not a leap year exception" for 29th Feb 200" > [https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/type/TimestampTZUtil.java#L196] > Since hive stores timestamp in UTC, when converting 200 CE/03/01 between > timezones, hive runs into an exception and fails with "not a leap year > exception" for 29th Feb 200 even if the actual record inserted was 200 > CE/03/01 in Asia/Singapore timezone. > > Fullstack trace: > {noformat} > java.lang.RuntimeException: java.io.IOException: > org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in > block -1 in file > file:/Users/simhadri.govindappa/Documents/apache/hive/itests/qtest/target/localfs/warehouse/test_sgt/sgt000 > at > org.apache.hadoop.hive.ql.exec.FetchTask.executeInner(FetchTask.java:210) > at org.apache.hadoop.hive.ql.exec.FetchTask.execute(FetchTask.java:95) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:212) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:230) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257) > at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:732) > at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:702) > at > org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:116) > at > org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) > at > org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org
[jira] [Resolved] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception
[ https://issues.apache.org/jira/browse/HIVE-28249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa resolved HIVE-28249. Fix Version/s: 4.1.0 Resolution: Fixed > Parquet legacy timezone conversion converts march 1st to 29th feb and fails > with not a leap year exception > -- > > Key: HIVE-28249 > URL: https://issues.apache.org/jira/browse/HIVE-28249 > Project: Hive > Issue Type: Task >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > > When handling legacy time stamp conversions in parquet,'February 29' year > '200' is an edge case. > This is because, according to this: [https://www.lanl.gov/Caesar/node202.html] > The Julian day for 200 CE/02/29 in the Julian calendar is different from the > Julian day in Gregorian Calendar . > ||Date (BC/AD)||Date (CE)||Julian Day||Julian Day|| > |-| -|(Julian Calendar)|(Gregorian Calendar)| > |200 AD/02/28|200 CE/02/28|1794166|1794167| > |200 AD/02/29|200 CE/02/29|1794167|1794168| > |200 AD/03/01|200 CE/03/01|1794168|1794168| > |300 AD/02/28|300 CE/02/28|1830691|1830691| > |300 AD/02/29|300 CE/02/29|1830692|1830692| > |300 AD/03/01|300 CE/03/01|1830693|1830692| > > * Because of this: > {noformat} > int julianDay = nt.getJulianDay(); {noformat} > returns julian day 1794167 > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java#L92] > * Later : > {noformat} > Timestamp result = Timestamp.valueOf(formatter.format(date)); {noformat} > _{{{}formatter.format(date{}}})_ returns 29-02-200 as it seems to be using > julian calendar > but _{{Timestamp.valueOf(29-02-200)}}_ seems to be using gregorian calendar > and fails with "not a leap year exception" for 29th Feb 200" > [https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/type/TimestampTZUtil.java#L196] > Since hive stores timestamp in UTC, when converting 200 CE/03/01 between > timezones, hive runs into an exception and fails with "not a leap year > exception" for 29th Feb 200 even if the actual record inserted was 200 > CE/03/01 in Asia/Singapore timezone. > > Fullstack trace: > {noformat} > java.lang.RuntimeException: java.io.IOException: > org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in > block -1 in file > file:/Users/simhadri.govindappa/Documents/apache/hive/itests/qtest/target/localfs/warehouse/test_sgt/sgt000 > at > org.apache.hadoop.hive.ql.exec.FetchTask.executeInner(FetchTask.java:210) > at org.apache.hadoop.hive.ql.exec.FetchTask.execute(FetchTask.java:95) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:212) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:230) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257) > at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:732) > at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:702) > at > org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:116) > at > org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) > at > org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135) > at org.junit.runners.Pare
[jira] [Comment Edited] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception
[ https://issues.apache.org/jira/browse/HIVE-28249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845291#comment-17845291 ] Simhadri Govindappa edited comment on HIVE-28249 at 5/10/24 2:24 PM: - Hi Stamatis, Thanks for the inputs. I have updated the Jira and the github PR with the stacktrace. I have added more details in the PR description, will add it here as well. {*}Steps to reproduce:{*}{*}{{*}} I have provided a q file that recreates the issue in the PR as well *Step 1:* In hive 2.1.1 or earlier version of 3.x with parquet 1.8 : # Create a table in Asia/Singapore Timezone: {noformat} create table default.test_sgt(currtime timestamp) stored as parquet;{noformat} # Insert record with date 0200-03-01 {noformat} insert into default.test_sgt values ('0200-03-01 00:00:00'){noformat} *Step 2:* After migrating the datafile for this table with BDR to hive 4 (with parquet version 1.10.x). We can recreate the error by running the following: {noformat} --! qt:timezone:Asia/Singapore CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT; SELECT * FROM TEST_SGT;{noformat} *Does this problem affect older versions of Hive?* Yes, it affects alpha and beta releases as well. *Does it appear when reading and writing of Parquet data happens with the same version?* No, we see the issue only when there is a cross-version writing and reading. *Does it require some specific properties to be set when reading/writing?* No, but it does depend on legacy conversions *Is it relevant only for 200 CE/02/29 or it affects other dates as well?* It is relevant for other century years as well such as 200 and so on until 1582 when gregorian calendar was used. Where there is an overlap. The Julian calendar defines a leap year as once every four years. The Gregorian calendar modified the addition of leap days, such that a century year was only counted as a leap year if it was also divisible by 400. So according to julian calender the year 200 or 300 is a leap year but they are not a leap year according to Gregorian calendar. That is why we are seeing this issue. *Is this a regression caused by another ticket?* Not sure yet, I am looking for it but haven't found the ticket that caused this regression. was (Author: simhadri-g): Hi Stamatis, Thanks for the inputs. I have updated the Jira and the github PR with the stacktrace. I have added more details in the PR description, will add it here as well. {*}Steps to reproduce:{*}{*}{{*}} I have provided a q file that recreates the issue in the PR as well *Step 1:* In hive 2.1.1 or earlier version of 3.x with parquet 1.8 : # Create a table in Asia/Singapore Timezone: {noformat} create table default.test_sgt(currtime timestamp) stored as parquet;{noformat} # Insert record with date 0200-03-01 {noformat} insert into default.test_sgt select '0200-03-01 00:00:00'{noformat} *Step 2:* After migrating the datafile for this table with BDR to hive 4 (with parquet version 1.10.x). We can recreate the error by running the following: {noformat} --! qt:timezone:Asia/Singapore CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT; SELECT * FROM TEST_SGT;{noformat} *Does this problem affect older versions of Hive?* Yes, it affects alpha and beta releases as well. *Does it appear when reading and writing of Parquet data happens with the same version?* No, we see the issue only when there is a cross-version writing and reading. *Does it require some specific properties to be set when reading/writing?* No, but it does depend on legacy conversions *Is it relevant only for 200 CE/02/29 or it affects other dates as well?* It is relevant for other century years as well such as 200 and so on until 1582 when gregorian calendar was used. Where there is an overlap. The Julian calendar defines a leap year as once every four years. The Gregorian calendar modified the addition of leap days, such that a century year was only counted as a leap year if it was also divisible by 400. So according to julian calender the year 200 or 300 is a leap year but they are not a leap year according to Gregorian calendar. That is why we are seeing this issue. *Is this a regression caused by another
[jira] [Comment Edited] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception
[ https://issues.apache.org/jira/browse/HIVE-28249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845291#comment-17845291 ] Simhadri Govindappa edited comment on HIVE-28249 at 5/10/24 2:17 PM: - Hi Stamatis, Thanks for the inputs. I have updated the Jira and the github PR with the stacktrace. I have added more details in the PR description, will add it here as well. {*}Steps to reproduce:{*}{*}{{*}} I have provided a q file that recreates the issue in the PR as well *Step 1:* In hive 2.1.1 or earlier version of 3.x with parquet 1.8 : # Create a table in Asia/Singapore Timezone: {noformat} create table default.test_sgt(currtime timestamp) stored as parquet;{noformat} # Insert record with date 0200-03-01 {noformat} insert into default.test_sgt select '0200-03-01 00:00:00'{noformat} *Step 2:* After migrating the datafile for this table with BDR to hive 4 (with parquet version 1.10.x). We can recreate the error by running the following: {noformat} --! qt:timezone:Asia/Singapore CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT; SELECT * FROM TEST_SGT;{noformat} *Does this problem affect older versions of Hive?* Yes, it affects alpha and beta releases as well. *Does it appear when reading and writing of Parquet data happens with the same version?* No, we see the issue only when there is a cross-version writing and reading. *Does it require some specific properties to be set when reading/writing?* No, but it does depend on legacy conversions *Is it relevant only for 200 CE/02/29 or it affects other dates as well?* It is relevant for other century years as well such as 200 and so on until 1582 when gregorian calendar was used. Where there is an overlap. The Julian calendar defines a leap year as once every four years. The Gregorian calendar modified the addition of leap days, such that a century year was only counted as a leap year if it was also divisible by 400. So according to julian calender the year 200 or 300 is a leap year but they are not a leap year according to Gregorian calendar. That is why we are seeing this issue. *Is this a regression caused by another ticket?* Not sure yet, I am looking for it but haven't found the ticket that caused this regression. was (Author: simhadri-g): Hi Stamatis, Thanks for the inputs. I have updated the Jira and the github PR with the stacktrace. I have added more details in the PR description, will add it here as well. {*}Steps to reproduce:{*}{*}{{*}} I have provided a q file that recreates the issue in the PR as well *Step 1:* In hive 2.1.1 or earlier version of 3.x with parquet 1.8 : # Create a table in Asia/Singapore Timezone: {noformat} create table default.test_sgt(currtime timestamp) stored as parquet;{noformat} # Insert record with date 0200-03-01 {noformat} insert into default.test_sgt select '0200-03-01 00:00:00'{noformat} *Step 2:* After migrating the datafile for this table with BDR to hive 4 (with parquet version 1.10.x). We can recreate the error by running the following: {noformat} --! qt:timezone:Asia/Singapore CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT; SELECT * FROM TEST_SGT;{noformat} *Does this problem affect older versions of Hive?* Yes, it affects alpha and beta releases as well. *Does it appear when reading and writing of Parquet data happens with the same version?* No, we see the issue only when there is a cross-version writing and reading. *Does it require some specific properties to be set when reading/writing?* No, but it does depend on legacy conversions *Is it relevant only for 200 CE/02/29 or it affects other dates as well?* It is relevant for other century years as well such as 200, 300, 500, 600 and so on until 1582 when gregorian calendar was used. The Julian calendar defines a leap year as once every four years. The Gregorian calendar modified the addition of leap days, such that a century year was only counted as a leap year if it was also divisible by 400. So according to julian calender the year 200 or 300 is a leap year but they are not a leap year according to Gregorian calendar. That is why we are seeing this issue. *Is this a regression caused by another ticket?* Not
[jira] [Comment Edited] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception
[ https://issues.apache.org/jira/browse/HIVE-28249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845291#comment-17845291 ] Simhadri Govindappa edited comment on HIVE-28249 at 5/10/24 1:56 PM: - Hi Stamatis, Thanks for the inputs. I have updated the Jira and the github PR with the stacktrace. I have added more details in the PR description, will add it here as well. {*}Steps to reproduce:{*}{*}{{*}} I have provided a q file that recreates the issue in the PR as well *Step 1:* In hive 2.1.1 or earlier version of 3.x with parquet 1.8 : # Create a table in Asia/Singapore Timezone: {noformat} create table default.test_sgt(currtime timestamp) stored as parquet;{noformat} # Insert record with date 0200-03-01 {noformat} insert into default.test_sgt select '0200-03-01 00:00:00'{noformat} *Step 2:* After migrating the datafile for this table with BDR to hive 4 (with parquet version 1.10.x). We can recreate the error by running the following: {noformat} --! qt:timezone:Asia/Singapore CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT; SELECT * FROM TEST_SGT;{noformat} *Does this problem affect older versions of Hive?* Yes, it affects alpha and beta releases as well. *Does it appear when reading and writing of Parquet data happens with the same version?* No, we see the issue only when there is a cross-version writing and reading. *Does it require some specific properties to be set when reading/writing?* No, but it does depend on legacy conversions *Is it relevant only for 200 CE/02/29 or it affects other dates as well?* It is relevant for other century years as well such as 200, 300, 500, 600 and so on until 1582 when gregorian calendar was used. The Julian calendar defines a leap year as once every four years. The Gregorian calendar modified the addition of leap days, such that a century year was only counted as a leap year if it was also divisible by 400. So according to julian calender the year 200 or 300 is a leap year but they are not a leap year according to Gregorian calendar. That is why we are seeing this issue. *Is this a regression caused by another ticket?* Not sure yet, I am looking for it but haven't found the ticket that caused this regression. was (Author: simhadri-g): Hi Stamatis, Thanks for the inputs. I have updated the Jira and the github PR with the stacktrace. I have added more details in the PR description, will add it here as well. {*}Steps to reproduce:{*}{*}{{*}} I have provided a q file that recreates the issue in the PR as well *Step 1:* In hive 2.1.1 or earlier version of 3.x with parquet 1.8 : # Create a table in Asia/Singapore Timezone: {noformat} create table default.test_sgt(currtime timestamp) stored as parquet;{noformat} # Insert record with date 0200-03-01 {noformat} insert into default.test_vj select '0200-03-01 00:00:00'{noformat} *Step 2:* After migrating the datafile for this table with BDR to hive 4 (with parquet version 1.10.x). We can recreate the error by running the following: {noformat} --! qt:timezone:Asia/Singapore CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT; SELECT * FROM TEST_SGT;{noformat} *Does this problem affect older versions of Hive?* Yes, it affects alpha and beta releases as well. *Does it appear when reading and writing of Parquet data happens with the same version?* No, we see the issue only when there is a cross-version writing and reading. *Does it require some specific properties to be set when reading/writing?* No, but it does depend on legacy conversions *Is it relevant only for 200 CE/02/29 or it affects other dates as well?* It is relevant for other century years as well such as 200, 300, 500, 600 and so on until 1582 when gregorian calendar was used. The Julian calendar defines a leap year as once every four years. The Gregorian calendar modified the addition of leap days, such that a century year was only counted as a leap year if it was also divisible by 400. So according to julian calender the year 200 or 300 is a leap year but they are not a leap year according to Gregorian calendar. That is why we are seeing this issue. *Is this a regression caused by another ticket?* Not sure yet, I
[jira] [Comment Edited] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception
[ https://issues.apache.org/jira/browse/HIVE-28249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845291#comment-17845291 ] Simhadri Govindappa edited comment on HIVE-28249 at 5/10/24 11:03 AM: -- Hi Stamatis, Thanks for the inputs. I have updated the Jira and the github PR with the stacktrace. I have added more details in the PR description, will add it here as well. {*}Steps to reproduce:{*}{*}{{*}} I have provided a q file that recreates the issue in the PR as well *Step 1:* In hive 2.1.1 or earlier version of 3.x with parquet 1.8 : # Create a table in Asia/Singapore Timezone: {noformat} create table default.test_sgt(currtime timestamp) stored as parquet;{noformat} # Insert record with date 0200-03-01 {noformat} insert into default.test_vj select '0200-03-01 00:00:00'{noformat} *Step 2:* After migrating the datafile for this table with BDR to hive 4 (with parquet version 1.10.x). We can recreate the error by running the following: {noformat} --! qt:timezone:Asia/Singapore CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT; SELECT * FROM TEST_SGT;{noformat} *Does this problem affect older versions of Hive?* Yes, it affects alpha and beta releases as well. *Does it appear when reading and writing of Parquet data happens with the same version?* No, we see the issue only when there is a cross-version writing and reading. *Does it require some specific properties to be set when reading/writing?* No, but it does depend on legacy conversions *Is it relevant only for 200 CE/02/29 or it affects other dates as well?* It is relevant for other century years as well such as 200, 300, 500, 600 and so on until 1582 when gregorian calendar was used. The Julian calendar defines a leap year as once every four years. The Gregorian calendar modified the addition of leap days, such that a century year was only counted as a leap year if it was also divisible by 400. So according to julian calender the year 200 or 300 is a leap year but they are not a leap year according to Gregorian calendar. That is why we are seeing this issue. *Is this a regression caused by another ticket?* Not sure yet, I am looking for it but haven't found the ticket that caused this regression. was (Author: simhadri-g): Hi Stamatis, Thanks for the inputs. I have updated the Jira and the github PR with the stacktrace. I have added more details in the PR description, will add it here as well. {*}Steps to reproduce:{*}{*}{{*}} I have provided a q file that recreates the issue in the PR as well *Step 1:* In hive 2.1.1 or 3.x with parquet 1.8 : # Create a table in Asia/Singapore Timezone: {noformat} create table default.test_sgt(currtime timestamp) stored as parquet;{noformat} # Insert record with date 0200-03-01 {noformat} insert into default.test_vj select '0200-03-01 00:00:00'{noformat} *Step 2:* After migrating the datafile for this table with BDR to hive 4 (with parquet version 1.10.x). We can recreate the error by running the following: {noformat} --! qt:timezone:Asia/Singapore CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT; SELECT * FROM TEST_SGT;{noformat} *Does this problem affect older versions of Hive?* Yes, it affects alpha and beta releases as well. *Does it appear when reading and writing of Parquet data happens with the same version?* No, we see the issue only when there is a cross-version writing and reading. *Does it require some specific properties to be set when reading/writing?* No, but it does depend on legacy conversions *Is it relevant only for 200 CE/02/29 or it affects other dates as well?* It is relevant for other century years as well such as 200, 300, 500, 600 and so on until 1582 when gregorian calendar was used. The Julian calendar defines a leap year as once every four years. The Gregorian calendar modified the addition of leap days, such that a century year was only counted as a leap year if it was also divisible by 400. So according to julian calender the year 200 or 300 is a leap year but they are not a leap year according to Gregorian calendar. That is why we are seeing this issue. *Is this a regression caused by another ticket?* Not sure yet, I am looking for it
[jira] [Comment Edited] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception
[ https://issues.apache.org/jira/browse/HIVE-28249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845291#comment-17845291 ] Simhadri Govindappa edited comment on HIVE-28249 at 5/10/24 10:47 AM: -- Hi Stamatis, Thanks for the inputs. I have updated the Jira and the github PR with the stacktrace. I have added more details in the PR description, will add it here as well. {*}Steps to reproduce:{*}{*}{{*}} I have provided a q file that recreates the issue in the PR as well *Step 1:* In hive 2.1.1 or 3.x with parquet 1.8 : # Create a table in Asia/Singapore Timezone: {noformat} create table default.test_sgt(currtime timestamp) stored as parquet;{noformat} # Insert record with date 0200-03-01 {noformat} insert into default.test_vj select '0200-03-01 00:00:00'{noformat} *Step 2:* After migrating the datafile for this table with BDR to hive 4 (with parquet version 1.10.x). We can recreate the error by running the following: {noformat} --! qt:timezone:Asia/Singapore CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT; SELECT * FROM TEST_SGT;{noformat} *Does this problem affect older versions of Hive?* Yes, it affects alpha and beta releases as well. *Does it appear when reading and writing of Parquet data happens with the same version?* No, we see the issue only when there is a cross-version writing and reading. *Does it require some specific properties to be set when reading/writing?* No, but it does depend on legacy conversions *Is it relevant only for 200 CE/02/29 or it affects other dates as well?* It is relevant for other century years as well such as 200, 300, 500, 600 and so on until 1582 when gregorian calendar was used. The Julian calendar defines a leap year as once every four years. The Gregorian calendar modified the addition of leap days, such that a century year was only counted as a leap year if it was also divisible by 400. So according to julian calender the year 200 or 300 is a leap year but they are not a leap year according to Gregorian calendar. That is why we are seeing this issue. *Is this a regression caused by another ticket?* Not sure yet, I am looking for it but haven't found the ticket that caused this regression. was (Author: simhadri-g): Hi Stamatis, Thanks for the inputs. I have updated the Jira and the github PR with the stacktrace. I have added more details in the PR description, will add it here as well. {*}Steps to reproduce:{*}{*}{{*}} I have provided a q file that recreates the issue in the PR as well *Step 1:* In hive 2.1.1 or 3.x with parquet 1.8 : # Create a table in Asia/Singapore Timezone: {noformat} create table default.test_sgt(currtime timestamp) stored as parquet;{noformat} # Insert record with date 0200-03-01 {noformat} insert into default.test_vj select '0200-03-01 00:00:00'{noformat} *Step 2:* After migrating the datafile for this table with BDR to hive 4 (with parquet version 1.10.x). We can recreate the error by running the following: {noformat} --! qt:timezone:Asia/Singapore CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT; SELECT * FROM TEST_SGT;{noformat} *Does this problem affect older versions of Hive?* Yes, it affects alpha and beta releases as well. *Does it appear when reading and writing of Parquet data happens with the same version?* No, we see the issue only when there is a cross-version writing and reading. *Does it require some specific properties to be set when reading/writing?* No, but it does depend on legacy conversions *Is it relevant only for 200 CE/02/29 or it affects other dates as well?* Other century years such as 200, 300, 500, 600 and so on until 1582 when gregorian calendar was used. The Julian calendar defines a leap year as once every four years. The Gregorian calendar modified the addition of leap days, such that a century year was only counted as a leap year if it was also divisible by 400. So according to julian calender the year 200 or 300 is a leap year but they are not a leap year according to Gregorian calendar. That is why we are seeing this issue. *Is this a regression caused by another ticket?* Not sure yet, I am looking for it but haven't found the ticket that caused this
[jira] [Comment Edited] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception
[ https://issues.apache.org/jira/browse/HIVE-28249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845291#comment-17845291 ] Simhadri Govindappa edited comment on HIVE-28249 at 5/10/24 10:46 AM: -- Hi Stamatis, Thanks for the inputs. I have updated the Jira and the github PR with the stacktrace. I have added more details in the PR description, will add it here as well. {*}Steps to reproduce:{*}{*}{{*}} I have provided a q file that recreates the issue in the PR as well *Step 1:* In hive 2.1.1 or 3.x with parquet 1.8 : # Create a table in Asia/Singapore Timezone: {noformat} create table default.test_sgt(currtime timestamp) stored as parquet;{noformat} # Insert record with date 0200-03-01 {noformat} insert into default.test_vj select '0200-03-01 00:00:00'{noformat} *Step 2:* After migrating the datafile for this table with BDR to hive 4 (with parquet version 1.10.x). We can recreate the error by running the following: {noformat} --! qt:timezone:Asia/Singapore CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT; SELECT * FROM TEST_SGT;{noformat} *Does this problem affect older versions of Hive?* Yes, it affects alpha and beta releases as well. *Does it appear when reading and writing of Parquet data happens with the same version?* No, we see the issue only when there is a cross-version writing and reading. *Does it require some specific properties to be set when reading/writing?* No, but it does depend on legacy conversions *Is it relevant only for 200 CE/02/29 or it affects other dates as well?* Other century years such as 200, 300, 500, 600 and so on until 1582 when gregorian calendar was used. The Julian calendar defines a leap year as once every four years. The Gregorian calendar modified the addition of leap days, such that a century year was only counted as a leap year if it was also divisible by 400. So according to julian calender the year 200 or 300 is a leap year but they are not a leap year according to Gregorian calendar. That is why we are seeing this issue. *Is this a regression caused by another ticket?* Not sure yet, I am looking for it but haven't found the ticket that caused this regression. was (Author: simhadri-g): Hi Stamatis, Thanks for the inputs. I have updated the Jira and the github PR with the stacktrace. I have added more details in the PR description, will add it here as well. {*}Steps to reproduce:{*}{*}{{*}} I have provided a q file that recreates the issue in the PR as well *Step 1:* In hive 2.1.1 or 3.x with parquet 1.8 : # Create a table in Asia/Singapore Timezone: {noformat} create table default.test_sgt(currtime timestamp) stored as parquet;{noformat} # Insert record with date 0200-03-01 {noformat} insert into default.test_vj select '0200-03-01 00:00:00'{noformat} *Step 2:* After migrating the datafile for this table with BDR to hive 4 with parquet version 1.10.x, we recreate the error by running the following: {noformat} --! qt:timezone:Asia/Singapore CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT; SELECT * FROM TEST_SGT;{noformat} *Does this problem affect older versions of Hive?* Yes, it affects alpha and beta releases as well. *Does it appear when reading and writing of Parquet data happens with the same version?* No, we see the issue only when there is a cross-version writing and reading. *Does it require some specific properties to be set when reading/writing?* No, but it does depend on legacy conversions *Is it relevant only for 200 CE/02/29 or it affects other dates as well?* Other century years such as 200, 300, 500, 600 and so on until 1582 when gregorian calendar was used. The Julian calendar defines a leap year as once every four years. The Gregorian calendar modified the addition of leap days, such that a century year was only counted as a leap year if it was also divisible by 400. So according to julian calender the year 200 or 300 is a leap year but they are not a leap year according to Gregorian calendar. That is why we are seeing this issue. *Is this a regression caused by another ticket?* Not sure yet, I am looking for it but haven't found the ticket that caused this regression. > Parquet legacy ti
[jira] [Comment Edited] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception
[ https://issues.apache.org/jira/browse/HIVE-28249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845291#comment-17845291 ] Simhadri Govindappa edited comment on HIVE-28249 at 5/10/24 10:44 AM: -- Hi Stamatis, Thanks for the inputs. I have updated the Jira and the github PR with the stacktrace. I have added more details in the PR description, will add it here as well. {*}Steps to reproduce:{*}{*}{{*}} I have provided a q file that recreates the issue in the PR as well *Step 1:* In hive 2.1.1 or 3.x with parquet 1.8 : # Create a table in Asia/Singapore Timezone: {noformat} create table default.test_sgt(currtime timestamp) stored as parquet;{noformat} # Insert record with date 0200-03-01 {noformat} insert into default.test_vj select '0200-03-01 00:00:00'{noformat} *Step 2:* After migrating the datafile for this table with BDR to hive 4 with parquet version 1.10.x, we recreate the error by running the following: {noformat} --! qt:timezone:Asia/Singapore CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT; SELECT * FROM TEST_SGT;{noformat} *Does this problem affect older versions of Hive?* Yes, it affects alpha and beta releases as well. *Does it appear when reading and writing of Parquet data happens with the same version?* No, we see the issue only when there is a cross-version writing and reading. *Does it require some specific properties to be set when reading/writing?* No, but it does depend on legacy conversions *Is it relevant only for 200 CE/02/29 or it affects other dates as well?* Other century years such as 200, 300, 500, 600 and so on until 1582 when gregorian calendar was used. The Julian calendar defines a leap year as once every four years. The Gregorian calendar modified the addition of leap days, such that a century year was only counted as a leap year if it was also divisible by 400. So according to julian calender the year 200 or 300 is a leap year but they are not a leap year according to Gregorian calendar. That is why we are seeing this issue. *Is this a regression caused by another ticket?* Not sure yet, I am looking for it but haven't found the ticket that caused this regression. was (Author: simhadri-g): Hi Stamatis, Thanks for the inputs. I have updated the Jira and the github PR with the stacktrace. I have added more details in the PR description, will add it here as well. {*}Steps to reproduce:{*}{*}{*} I have provided a q file that recreates the issue in the PR as well *Step 1:* In hive 2.1.1 or 3.x with parquet 1.8 : # Create a table in Asia/Singapore Timezone: {noformat} create table default.test_sgt(currtime timestamp) stored as parquet;{noformat} # Insert record with date 0200-03-01 {noformat} insert into default.test_vj select '0200-03-01 00:00:00'{noformat} *Step 2:* After migrating the datafile for this table with BDR to hive 4 with parquet version 1.10.x, we recreate the error by running the following: {noformat} --! qt:timezone:Asia/Singapore CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT; SELECT * FROM TEST_SGT;{noformat} *Does this problem affect older versions of Hive?* Yes, it affects alpha and beta releases as well. *Does it appear when reading and writing of Parquet data happens with the same version?* No, we see the issue only when there is a cross-version writing and reading. *Does it require some specific properties to be set when reading/writing?* No, but it does depend on legacy conversions *Is it relevant only for 200 CE/02/29 or it affects other dates as well?* Other century years such as 200, 300, 500, 600 and so on until 1582 when gregorian calendar was used. The Julian calendar defines a leap year as once every four years. The Gregorian calendar modified the addition of leap days, such that a century year was only counted as a leap year if it was also divisible by 400. So according to julian calender the year 200 or 300 is a leap year but they are not a leap year according to Gregorian calendar. That is why we are seeing this issue. *Is this a regression caused by another ticket?* Not sure yet, I wasn't able to find the ticket yet. {{}} > Parquet legacy timezone conversion converts march 1st to 29
[jira] [Updated] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception
[ https://issues.apache.org/jira/browse/HIVE-28249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa updated HIVE-28249: --- Description: When handling legacy time stamp conversions in parquet,'February 29' year '200' is an edge case. This is because, according to this: [https://www.lanl.gov/Caesar/node202.html] The Julian day for 200 CE/02/29 in the Julian calendar is different from the Julian day in Gregorian Calendar . ||Date (BC/AD)||Date (CE)||Julian Day||Julian Day|| |-| -|(Julian Calendar)|(Gregorian Calendar)| |200 AD/02/28|200 CE/02/28|1794166|1794167| |200 AD/02/29|200 CE/02/29|1794167|1794168| |200 AD/03/01|200 CE/03/01|1794168|1794168| |300 AD/02/28|300 CE/02/28|1830691|1830691| |300 AD/02/29|300 CE/02/29|1830692|1830692| |300 AD/03/01|300 CE/03/01|1830693|1830692| * Because of this: {noformat} int julianDay = nt.getJulianDay(); {noformat} returns julian day 1794167 [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java#L92] * Later : {noformat} Timestamp result = Timestamp.valueOf(formatter.format(date)); {noformat} _{{{}formatter.format(date{}}})_ returns 29-02-200 as it seems to be using julian calendar but _{{Timestamp.valueOf(29-02-200)}}_ seems to be using gregorian calendar and fails with "not a leap year exception" for 29th Feb 200" [https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/type/TimestampTZUtil.java#L196] Since hive stores timestamp in UTC, when converting 200 CE/03/01 between timezones, hive runs into an exception and fails with "not a leap year exception" for 29th Feb 200 even if the actual record inserted was 200 CE/03/01 in Asia/Singapore timezone. Fullstack trace: {noformat} java.lang.RuntimeException: java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file file:/Users/simhadri.govindappa/Documents/apache/hive/itests/qtest/target/localfs/warehouse/test_sgt/sgt000 at org.apache.hadoop.hive.ql.exec.FetchTask.executeInner(FetchTask.java:210) at org.apache.hadoop.hive.ql.exec.FetchTask.execute(FetchTask.java:95) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:212) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:230) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257) at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356) at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:732) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:702) at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:116) at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) at org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.runners.ParentR
[jira] [Commented] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception
[ https://issues.apache.org/jira/browse/HIVE-28249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845291#comment-17845291 ] Simhadri Govindappa commented on HIVE-28249: Hi Stamatis, Thanks for the inputs. I have updated the Jira and the github PR with the stacktrace. I have added more details in the PR description, will add it here as well. {*}Steps to reproduce:{*}{*}{*} I have provided a q file that recreates the issue in the PR as well *Step 1:* In hive 2.1.1 or 3.x with parquet 1.8 : # Create a table in Asia/Singapore Timezone: {noformat} create table default.test_sgt(currtime timestamp) stored as parquet;{noformat} # Insert record with date 0200-03-01 {noformat} insert into default.test_vj select '0200-03-01 00:00:00'{noformat} *Step 2:* After migrating the datafile for this table with BDR to hive 4 with parquet version 1.10.x, we recreate the error by running the following: {noformat} --! qt:timezone:Asia/Singapore CREATE EXTERNAL TABLE `TEST_SGT`(`currtime` timestamp) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; LOAD DATA LOCAL INPATH '../../data/files/sgt000' INTO TABLE TEST_SGT; SELECT * FROM TEST_SGT;{noformat} *Does this problem affect older versions of Hive?* Yes, it affects alpha and beta releases as well. *Does it appear when reading and writing of Parquet data happens with the same version?* No, we see the issue only when there is a cross-version writing and reading. *Does it require some specific properties to be set when reading/writing?* No, but it does depend on legacy conversions *Is it relevant only for 200 CE/02/29 or it affects other dates as well?* Other century years such as 200, 300, 500, 600 and so on until 1582 when gregorian calendar was used. The Julian calendar defines a leap year as once every four years. The Gregorian calendar modified the addition of leap days, such that a century year was only counted as a leap year if it was also divisible by 400. So according to julian calender the year 200 or 300 is a leap year but they are not a leap year according to Gregorian calendar. That is why we are seeing this issue. *Is this a regression caused by another ticket?* Not sure yet, I wasn't able to find the ticket yet. {{}} > Parquet legacy timezone conversion converts march 1st to 29th feb and fails > with not a leap year exception > -- > > Key: HIVE-28249 > URL: https://issues.apache.org/jira/browse/HIVE-28249 > Project: Hive > Issue Type: Task >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > > When handling legacy time stamp conversions in parquet,'February 29' year > '200' is an edge case. > This is because, according to this: [https://www.lanl.gov/Caesar/node202.html] > The Julian day for 200 CE/02/29 in the Julian calendar is different from the > Julian day in Gregorian Calendar . > ||Date (BC/AD)||Date (CE)||Julian Day||Julian Day|| > |-| -|(Julian Calendar)|(Gregorian Calendar)| > |200 AD/02/28|200 CE/02/28|1794166|1794167| > |200 AD/02/29|200 CE/02/29|1794167|1794168| > |200 AD/03/01|200 CE/03/01|1794168|1794168| > |300 AD/02/28|300 CE/02/28|1830691|1830691| > |300 AD/02/29|300 CE/02/29|1830692|1830692| > |300 AD/03/01|300 CE/03/01|1830693|1830692| > > * Because of this: > {noformat} > int julianDay = nt.getJulianDay(); {noformat} > returns julian day 1794167 > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java#L92] > * Later : > {noformat} > Timestamp result = Timestamp.valueOf(formatter.format(date)); {noformat} > _{{{}formatter.format(date{}}})_ returns 29-02-200 as it seems to be using > julian calendar > but _{{Timestamp.valueOf(29-02-200)}}_ seems to be using gregorian calendar > and fails with "not a leap year exception" for 29th Feb 200" > [https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/type/TimestampTZUtil.java#L196] > Since hive stores timestamp in UTC, when converting 200 CE/03/01 between > timezones, hive runs into an exception and fails with "not a leap year > exception" for 29th Feb 200 even if the actual record inserted was 200 > CE/03/01 in Asia/Singapore timezone. > > Fullstack trace: > {noformat} > java.lang.RuntimeException: java.io.IOException: > org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in > block -1 in file > file:/Users/simhadri.govindappa/Documents/apache/hive/itests/qtes
[jira] [Updated] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception
[ https://issues.apache.org/jira/browse/HIVE-28249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa updated HIVE-28249: --- Description: When handling legacy time stamp conversions in parquet,'February 29' year '200' is an edge case. This is because, according to this: [https://www.lanl.gov/Caesar/node202.html] The Julian day for 200 CE/02/29 in the Julian calendar is different from the Julian day in Gregorian Calendar . ||Date (BC/AD)||Date (CE)||Julian Day||Julian Day|| |-| -|(Julian Calendar)|(Gregorian Calendar)| |200 AD/02/28|200 CE/02/28|1794166|1794167| |200 AD/02/29|200 CE/02/29|1794167|1794168| |200 AD/03/01|200 CE/03/01|1794168|1794168| |300 AD/02/28|300 CE/02/28|1830691|1830691| |300 AD/02/29|300 CE/02/29|1830692|1830692| |300 AD/03/01|300 CE/03/01|1830693|1830692| Because of this: {{}} {noformat} int julianDay = nt.getJulianDay(); {noformat} returns julian day 1794167 [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java#L92] Later : {{}} {noformat} Timestamp result = Timestamp.valueOf(formatter.format(date)); {noformat} {{{}{}}}{*}_{{{}formatter.format(date{}}})_{*} returns 29-02-200 as it seems to be using julian calendar but _{{Timestamp.valueOf(29-02-200)}}_ seems to be using gregorian calendar and fails with "not a leap year exception" for 29th Feb 200" [https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/type/TimestampTZUtil.java#L196] Since hive stores timestamp in UTC, when converting 200 CE/03/01 between timezones, hive runs into an exception and fails with "not a leap year exception" for 29th Feb 200 even if the actual record inserted was 200 CE/03/01 in Asia/Singapore timezone. Fullstack trace: {noformat} java.lang.RuntimeException: java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file file:/Users/simhadri.govindappa/Documents/apache/hive/itests/qtest/target/localfs/warehouse/test_sgt/sgt000 at org.apache.hadoop.hive.ql.exec.FetchTask.executeInner(FetchTask.java:210) at org.apache.hadoop.hive.ql.exec.FetchTask.execute(FetchTask.java:95) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:212) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:230) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257) at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356) at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:732) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:702) at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:116) at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) at org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.juni
[jira] [Updated] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception
[ https://issues.apache.org/jira/browse/HIVE-28249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa updated HIVE-28249: --- Description: When handling legacy timezone conversions in parquet, 'February 29' year '200' is an edge case. This is because, according to this: [https://www.lanl.gov/Caesar/node202.html] The Julian day for 200 CE/02/29 in the Julian calendar is different from the Julian day in Gregorian Calendar . |Date (BC/AD)|Date (CE)|Julian Day|Julian Day| | | |(Julian Calendar)|(Gregorian Calendar)| |200 AD/02/28|200 CE/02/28|1794166|1794167| |200 AD/02/29|200 CE/02/29|1794167|1794168| |200 AD/03/01|200 CE/03/01|1794168|1794168| As a result since hive stores timestamp in UTC, when converting 200 CE/03/01 between timezones, hive runs into an exception and fails with "not a leap year exception" for 29th Feb 200 even if the actual record inserted was 200 CE/03/01 in Asia/Singapore timezone. Fullstack trace: {noformat} java.lang.RuntimeException: java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file file:/Users/simhadri.govindappa/Documents/apache/hive/itests/qtest/target/localfs/warehouse/test_sgt/sgt000 at org.apache.hadoop.hive.ql.exec.FetchTask.executeInner(FetchTask.java:210) at org.apache.hadoop.hive.ql.exec.FetchTask.execute(FetchTask.java:95) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:212) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:230) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257) at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356) at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:732) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:702) at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:116) at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) at org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:27) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.apache.hadoop.hive.cli.control.CliAdapter$1$1.evaluate(CliAdapter.java:95) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.apache.maven
[jira] [Updated] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception
[ https://issues.apache.org/jira/browse/HIVE-28249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa updated HIVE-28249: --- Description: When handling legacy timezone conversions in parquet, 'February 29' year '200' is an edge case. This is because, according to this: [https://www.lanl.gov/Caesar/node202.html] The Julian day for 200 CE/02/29 in the Julian calendar is different from the Julian day in Gregorian Calendar . |Date (BC/AD)|Date (CE)|Julian Day|Julian Day| | | |(Julian Calendar)|(Gregorian Calendar)| |200 AD/02/28|200 CE/02/28|1794166|1794167| |200 AD/02/29|200 CE/02/29|1794167|1794168| |200 AD/03/01|200 CE/03/01|1794168|1794168| As a result since hive stores timestamp in UTC, when converting 200 CE/03/01 between timezones, hive runs into an exception and fails with "not a leap year exception" for 29th Feb 200 even if the actual record inserted was 200 CE/03/01 in Asia/Singapore timezone. > Parquet legacy timezone conversion converts march 1st to 29th feb and fails > with not a leap year exception > -- > > Key: HIVE-28249 > URL: https://issues.apache.org/jira/browse/HIVE-28249 > Project: Hive > Issue Type: Task >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > > When handling legacy timezone conversions in parquet, 'February 29' year > '200' is an edge case. > This is because, according to this: [https://www.lanl.gov/Caesar/node202.html] > The Julian day for 200 CE/02/29 in the Julian calendar is different from the > Julian day in Gregorian Calendar . > |Date (BC/AD)|Date (CE)|Julian Day|Julian Day| > | | |(Julian Calendar)|(Gregorian Calendar)| > |200 AD/02/28|200 CE/02/28|1794166|1794167| > |200 AD/02/29|200 CE/02/29|1794167|1794168| > |200 AD/03/01|200 CE/03/01|1794168|1794168| > As a result since hive stores timestamp in UTC, when converting 200 CE/03/01 > between timezones, hive runs into an exception and fails with "not a leap > year exception" for 29th Feb 200 even if the actual record inserted was 200 > CE/03/01 in Asia/Singapore timezone. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception
Simhadri Govindappa created HIVE-28249: -- Summary: Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception Key: HIVE-28249 URL: https://issues.apache.org/jira/browse/HIVE-28249 Project: Hive Issue Type: Task Reporter: Simhadri Govindappa Assignee: Simhadri Govindappa -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28211) Restore hive-exec-core jar
Simhadri Govindappa created HIVE-28211: -- Summary: Restore hive-exec-core jar Key: HIVE-28211 URL: https://issues.apache.org/jira/browse/HIVE-28211 Project: Hive Issue Type: Task Reporter: Simhadri Govindappa Assignee: Simhadri Govindappa The hive-exec-core jar is used by spark, oozie, hudi and many other pojects. Removal of the hive-exec-core jar has caused the following issues. Spark : [https://lists.apache.org/list?d...@hive.apache.org:lte=1M:joda] Oozie: [https://lists.apache.org/thread/yld75ltf9y8d9q3cow3xqlg0fqyj6mkg] Hudi: [apache/hudi#8147|https://github.com/apache/hudi/issues/8147] Until the we shade & relocate dependencies in hive-exec, we should restore the hive-exec core jar . -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-28153) Flaky test TestConflictingDataFiles.testMultiFiltersUpdate
[ https://issues.apache.org/jira/browse/HIVE-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa resolved HIVE-28153. Fix Version/s: 4.1.0 Resolution: Fixed > Flaky test TestConflictingDataFiles.testMultiFiltersUpdate > -- > > Key: HIVE-28153 > URL: https://issues.apache.org/jira/browse/HIVE-28153 > Project: Hive > Issue Type: Test > Components: Test >Reporter: Butao Zhang >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > > This test has been failing a lot lately, such as > [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5063/13/tests/] > > And the flaky test shows this test is unstable: > [http://ci.hive.apache.org/job/hive-flaky-check/831/testReport/] > {code:java} > 10:29:21 [INFO] T E S T S > 10:29:21 [INFO] --- > 10:29:21 [INFO] Running org.apache.iceberg.mr.hive.TestConflictingDataFiles > 10:36:13 [ERROR] Tests run: 60, Failures: 1, Errors: 0, Skipped: 24, Time > elapsed: 399.12 s <<< FAILURE! - in > org.apache.iceberg.mr.hive.TestConflictingDataFiles > 10:36:13 [ERROR] > org.apache.iceberg.mr.hive.TestConflictingDataFiles.testMultiFiltersUpdate[fileFormat=PARQUET, > engine=tez, catalog=HIVE_CATALOG, isVectorized=false, formatVersion=1] Time > elapsed: 11.781 s <<< FAILURE! > 10:36:13 java.lang.AssertionError: expected:<12> but was:<13> > 10:36:13 at org.junit.Assert.fail(Assert.java:89) > 10:36:13 at org.junit.Assert.failNotEquals(Assert.java:835) > 10:36:13 at org.junit.Assert.assertEquals(Assert.java:647) > 10:36:13 at org.junit.Assert.assertEquals(Assert.java:633) > 10:36:13 at > org.apache.iceberg.mr.hive.TestConflictingDataFiles.testMultiFiltersUpdate(TestConflictingDataFiles.java:135) > 10:36:13 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 10:36:13 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 10:36:13 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 10:36:13 at java.lang.reflect.Method.invoke(Method.java:498) > 10:36:13 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > 10:36:13 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > 10:36:13 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > 10:36:13 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > 10:36:13 at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > 10:36:13 at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > 10:36:13 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) > 10:36:13 at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > 10:36:13 at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > 10:36:13 at java.util.concurrent.FutureTask.run(FutureTask.java:266) > 10:36:13 at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28153) Flaky test TestConflictingDataFiles.testMultiFiltersUpdate
[ https://issues.apache.org/jira/browse/HIVE-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837165#comment-17837165 ] Simhadri Govindappa commented on HIVE-28153: Change is merged to master. Thanks, [~dkuzmenko] and [~zhangbutao] for the review! Additionally, I have raised HIVE-28192 to investigate the bug mentioned above. It seems like the IOContext is shared between threads in non-vectorized code flow which is causing duplicate records. > Flaky test TestConflictingDataFiles.testMultiFiltersUpdate > -- > > Key: HIVE-28153 > URL: https://issues.apache.org/jira/browse/HIVE-28153 > Project: Hive > Issue Type: Test > Components: Test >Reporter: Butao Zhang >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > > This test has been failing a lot lately, such as > [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5063/13/tests/] > > And the flaky test shows this test is unstable: > [http://ci.hive.apache.org/job/hive-flaky-check/831/testReport/] > {code:java} > 10:29:21 [INFO] T E S T S > 10:29:21 [INFO] --- > 10:29:21 [INFO] Running org.apache.iceberg.mr.hive.TestConflictingDataFiles > 10:36:13 [ERROR] Tests run: 60, Failures: 1, Errors: 0, Skipped: 24, Time > elapsed: 399.12 s <<< FAILURE! - in > org.apache.iceberg.mr.hive.TestConflictingDataFiles > 10:36:13 [ERROR] > org.apache.iceberg.mr.hive.TestConflictingDataFiles.testMultiFiltersUpdate[fileFormat=PARQUET, > engine=tez, catalog=HIVE_CATALOG, isVectorized=false, formatVersion=1] Time > elapsed: 11.781 s <<< FAILURE! > 10:36:13 java.lang.AssertionError: expected:<12> but was:<13> > 10:36:13 at org.junit.Assert.fail(Assert.java:89) > 10:36:13 at org.junit.Assert.failNotEquals(Assert.java:835) > 10:36:13 at org.junit.Assert.assertEquals(Assert.java:647) > 10:36:13 at org.junit.Assert.assertEquals(Assert.java:633) > 10:36:13 at > org.apache.iceberg.mr.hive.TestConflictingDataFiles.testMultiFiltersUpdate(TestConflictingDataFiles.java:135) > 10:36:13 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 10:36:13 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 10:36:13 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 10:36:13 at java.lang.reflect.Method.invoke(Method.java:498) > 10:36:13 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > 10:36:13 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > 10:36:13 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > 10:36:13 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > 10:36:13 at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > 10:36:13 at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > 10:36:13 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) > 10:36:13 at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > 10:36:13 at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > 10:36:13 at java.util.concurrent.FutureTask.run(FutureTask.java:266) > 10:36:13 at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28192) Iceberg: Fix thread safety issue with PositionDeleteInfo in IOContext
Simhadri Govindappa created HIVE-28192: -- Summary: Iceberg: Fix thread safety issue with PositionDeleteInfo in IOContext Key: HIVE-28192 URL: https://issues.apache.org/jira/browse/HIVE-28192 Project: Hive Issue Type: Task Reporter: Simhadri Govindappa Assignee: Simhadri Govindappa -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-28153) Flaky test TestConflictingDataFiles.testMultiFiltersUpdate
[ https://issues.apache.org/jira/browse/HIVE-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832862#comment-17832862 ] Simhadri Govindappa edited comment on HIVE-28153 at 4/1/24 2:34 PM: I investigated the issue. Looks like the test caught a new bug in the code of the iceberg delete writer. I will add more details soon. was (Author: simhadri-g): I investigated the issue. Looks like the test caught the new bug in the code of the iceberg delete writer. I will add more details soon. > Flaky test TestConflictingDataFiles.testMultiFiltersUpdate > -- > > Key: HIVE-28153 > URL: https://issues.apache.org/jira/browse/HIVE-28153 > Project: Hive > Issue Type: Test > Components: Test >Reporter: Butao Zhang >Assignee: Simhadri Govindappa >Priority: Major > > This test has been failing a lot lately, such as > [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5063/13/tests/] > > And the flaky test shows this test is unstable: > [http://ci.hive.apache.org/job/hive-flaky-check/831/testReport/] > {code:java} > 10:29:21 [INFO] T E S T S > 10:29:21 [INFO] --- > 10:29:21 [INFO] Running org.apache.iceberg.mr.hive.TestConflictingDataFiles > 10:36:13 [ERROR] Tests run: 60, Failures: 1, Errors: 0, Skipped: 24, Time > elapsed: 399.12 s <<< FAILURE! - in > org.apache.iceberg.mr.hive.TestConflictingDataFiles > 10:36:13 [ERROR] > org.apache.iceberg.mr.hive.TestConflictingDataFiles.testMultiFiltersUpdate[fileFormat=PARQUET, > engine=tez, catalog=HIVE_CATALOG, isVectorized=false, formatVersion=1] Time > elapsed: 11.781 s <<< FAILURE! > 10:36:13 java.lang.AssertionError: expected:<12> but was:<13> > 10:36:13 at org.junit.Assert.fail(Assert.java:89) > 10:36:13 at org.junit.Assert.failNotEquals(Assert.java:835) > 10:36:13 at org.junit.Assert.assertEquals(Assert.java:647) > 10:36:13 at org.junit.Assert.assertEquals(Assert.java:633) > 10:36:13 at > org.apache.iceberg.mr.hive.TestConflictingDataFiles.testMultiFiltersUpdate(TestConflictingDataFiles.java:135) > 10:36:13 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 10:36:13 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 10:36:13 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 10:36:13 at java.lang.reflect.Method.invoke(Method.java:498) > 10:36:13 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > 10:36:13 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > 10:36:13 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > 10:36:13 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > 10:36:13 at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > 10:36:13 at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > 10:36:13 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) > 10:36:13 at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > 10:36:13 at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > 10:36:13 at java.util.concurrent.FutureTask.run(FutureTask.java:266) > 10:36:13 at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28153) Flaky test TestConflictingDataFiles.testMultiFiltersUpdate
[ https://issues.apache.org/jira/browse/HIVE-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832862#comment-17832862 ] Simhadri Govindappa commented on HIVE-28153: I investigated the issue. Looks like the test caught the new bug in the code of the iceberg delete writer. I will add more details soon. > Flaky test TestConflictingDataFiles.testMultiFiltersUpdate > -- > > Key: HIVE-28153 > URL: https://issues.apache.org/jira/browse/HIVE-28153 > Project: Hive > Issue Type: Test > Components: Test >Reporter: Butao Zhang >Assignee: Simhadri Govindappa >Priority: Major > > This test has been failing a lot lately, such as > [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5063/13/tests/] > > And the flaky test shows this test is unstable: > [http://ci.hive.apache.org/job/hive-flaky-check/831/testReport/] > {code:java} > 10:29:21 [INFO] T E S T S > 10:29:21 [INFO] --- > 10:29:21 [INFO] Running org.apache.iceberg.mr.hive.TestConflictingDataFiles > 10:36:13 [ERROR] Tests run: 60, Failures: 1, Errors: 0, Skipped: 24, Time > elapsed: 399.12 s <<< FAILURE! - in > org.apache.iceberg.mr.hive.TestConflictingDataFiles > 10:36:13 [ERROR] > org.apache.iceberg.mr.hive.TestConflictingDataFiles.testMultiFiltersUpdate[fileFormat=PARQUET, > engine=tez, catalog=HIVE_CATALOG, isVectorized=false, formatVersion=1] Time > elapsed: 11.781 s <<< FAILURE! > 10:36:13 java.lang.AssertionError: expected:<12> but was:<13> > 10:36:13 at org.junit.Assert.fail(Assert.java:89) > 10:36:13 at org.junit.Assert.failNotEquals(Assert.java:835) > 10:36:13 at org.junit.Assert.assertEquals(Assert.java:647) > 10:36:13 at org.junit.Assert.assertEquals(Assert.java:633) > 10:36:13 at > org.apache.iceberg.mr.hive.TestConflictingDataFiles.testMultiFiltersUpdate(TestConflictingDataFiles.java:135) > 10:36:13 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 10:36:13 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 10:36:13 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 10:36:13 at java.lang.reflect.Method.invoke(Method.java:498) > 10:36:13 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > 10:36:13 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > 10:36:13 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > 10:36:13 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > 10:36:13 at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > 10:36:13 at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > 10:36:13 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) > 10:36:13 at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > 10:36:13 at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > 10:36:13 at java.util.concurrent.FutureTask.run(FutureTask.java:266) > 10:36:13 at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-28153) Flaky test TestConflictingDataFiles.testMultiFiltersUpdate
[ https://issues.apache.org/jira/browse/HIVE-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa reassigned HIVE-28153: -- Assignee: Simhadri Govindappa > Flaky test TestConflictingDataFiles.testMultiFiltersUpdate > -- > > Key: HIVE-28153 > URL: https://issues.apache.org/jira/browse/HIVE-28153 > Project: Hive > Issue Type: Test > Components: Test >Reporter: Butao Zhang >Assignee: Simhadri Govindappa >Priority: Major > > This test has been failing a lot lately, such as > [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5063/13/tests/] > > And the flaky test shows this test is unstable: > [http://ci.hive.apache.org/job/hive-flaky-check/831/testReport/] > {code:java} > 10:29:21 [INFO] T E S T S > 10:29:21 [INFO] --- > 10:29:21 [INFO] Running org.apache.iceberg.mr.hive.TestConflictingDataFiles > 10:36:13 [ERROR] Tests run: 60, Failures: 1, Errors: 0, Skipped: 24, Time > elapsed: 399.12 s <<< FAILURE! - in > org.apache.iceberg.mr.hive.TestConflictingDataFiles > 10:36:13 [ERROR] > org.apache.iceberg.mr.hive.TestConflictingDataFiles.testMultiFiltersUpdate[fileFormat=PARQUET, > engine=tez, catalog=HIVE_CATALOG, isVectorized=false, formatVersion=1] Time > elapsed: 11.781 s <<< FAILURE! > 10:36:13 java.lang.AssertionError: expected:<12> but was:<13> > 10:36:13 at org.junit.Assert.fail(Assert.java:89) > 10:36:13 at org.junit.Assert.failNotEquals(Assert.java:835) > 10:36:13 at org.junit.Assert.assertEquals(Assert.java:647) > 10:36:13 at org.junit.Assert.assertEquals(Assert.java:633) > 10:36:13 at > org.apache.iceberg.mr.hive.TestConflictingDataFiles.testMultiFiltersUpdate(TestConflictingDataFiles.java:135) > 10:36:13 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 10:36:13 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 10:36:13 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 10:36:13 at java.lang.reflect.Method.invoke(Method.java:498) > 10:36:13 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > 10:36:13 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > 10:36:13 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > 10:36:13 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > 10:36:13 at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > 10:36:13 at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > 10:36:13 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) > 10:36:13 at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > 10:36:13 at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > 10:36:13 at java.util.concurrent.FutureTask.run(FutureTask.java:266) > 10:36:13 at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27929) Run TPC-DS queries and validate results correctness
[ https://issues.apache.org/jira/browse/HIVE-27929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830442#comment-17830442 ] Simhadri Govindappa commented on HIVE-27929: Hi I have rerun the 1TB TPCDS test for the branch-4.0 [https://github.com/apache/hive/tree/branch-4.0] all the queries ran successfully with the correct results. Please find the details of the test below: # Create a 1 TB tpcds dataset .dat files # Loaded the data files to text tables # Create External ORC TPCDS tables and loaded the data from these Text tables # Followed by running the 99 TPCDS queries > Run TPC-DS queries and validate results correctness > --- > > Key: HIVE-27929 > URL: https://issues.apache.org/jira/browse/HIVE-27929 > Project: Hive > Issue Type: Sub-task >Reporter: Denys Kuzmenko >Assignee: Simhadri Govindappa >Priority: Major > > release branch: *branch-4.0* > https://github.com/apache/hive/tree/branch-4.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27953) Retire https://apache.github.io sites and remove obsolete content/actions
[ https://issues.apache.org/jira/browse/HIVE-27953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17829802#comment-17829802 ] Simhadri Govindappa commented on HIVE-27953: Thanks [~zabetak] , for helping with the reviews! :) > Retire https://apache.github.io sites and remove obsolete content/actions > - > > Key: HIVE-27953 > URL: https://issues.apache.org/jira/browse/HIVE-27953 > Project: Hive > Issue Type: Task > Components: Documentation >Reporter: Stamatis Zampetakis >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > > Currently there are three versions of the Hive website (populated from > different places and in various ways) available online. Below, I outline the > entry point URLs along with the latest commit that lead to the deployment > each version. > ||URL||Commit|| > |https://hive.apache.org/|https://github.com/apache/hive-site/commit/0162552c68006fd30411033d5e6a3d6806026851| > |https://apache.github.io/hive/|https://github.com/apache/hive/commit/1455f6201b0f7b061361bc9acc23cb810ff02483| > |https://apache.github.io/hive-site/|https://github.com/apache/hive-site/commit/95b1c8385fa50c2e59579899d2fd297b8a2ecefd| > People searching online for Hive may end-up in any of the above risking to > see pretty outdated information about the project. > For Hive developers (especially newcomers) it is very difficult to figure out > where they should apply their changes if they want to change something in the > website. Even people experienced with the various offering of ASF and GitHub > may have a hard time figuring things out. > I propose to retire/shutdown all GitHub pages deployments > (https://apache.github.io) and drop all content/branches that are not > relevant for the main website under https://hive.apache.org/. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28087) Hive Iceberg: Insert into partitioned table fails if the data is not clustered
[ https://issues.apache.org/jira/browse/HIVE-28087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa updated HIVE-28087: --- Description: Insert into partitioned table fails with the following error if the data is not clustered. *Using cluster by clause it succeeds :* {noformat} 0: jdbc:hive2://localhost:10001/> insert into table partition_transform_4 select t, ts from t1 cluster by ts; -- VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -- Map 1 .. container SUCCEEDED 1 100 0 0 Reducer 2 .. container SUCCEEDED 1 100 0 0 -- VERTICES: 02/02 [==>>] 100% ELAPSED TIME: 9.47 s -- INFO : Starting task [Stage-2:DEPENDENCY_COLLECTION] in serial mode INFO : Starting task [Stage-0:MOVE] in serial mode INFO : Completed executing command(queryId=root_20240222123244_0c448b32-4fd9-420d-be31-e39e2972af82); Time taken: 10.534 seconds 100 rows affected (10.696 seconds){noformat} *Without cluster By it fails:* {noformat} 0: jdbc:hive2://localhost:10001/> insert into table partition_transform_4 select t, ts from t1; -- VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -- Map 1 .. container SUCCEEDED 1 100 0 0 Reducer 2container RUNNING 1 010 2 0 -- VERTICES: 01/02 [=>>-] 50% ELAPSED TIME: 9.53 s -- Caused by: java.lang.IllegalStateException: Incoming records violate the writer assumption that records are clustered by spec and by partition within each spec. Either cluster the incoming records or switch to fanout writers. Encountered records that belong to already closed files: partition 'ts_month=2027-03' in spec [ 1000: ts_month: month(2) ] at org.apache.iceberg.io.ClusteredWriter.write(ClusteredWriter.java:96) at org.apache.iceberg.io.ClusteredDataWriter.write(ClusteredDataWriter.java:31) at org.apache.iceberg.mr.hive.writer.HiveIcebergRecordWriter.write(HiveIcebergRecordWriter.java:53) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1181) at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111) at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:502) ... 20 more{noformat} A simple repro, using the attached csv file: [^query-hive-377.csv] {noformat} create database t3; use t3; create table vector1k( t int, si int, i int, b bigint, f float, d double, dc decimal(38,18), bo boolean, s string, s2 string, ts timestamp, ts2 timestamp, dt date) row format delimited fields terminated by ','; load data local inpath "/query-hive-377.csv" OVERWRITE into table vector1k; select * from vector1k; create table vectortab10k( t int, si int, i int, b bigint, f float, d double, dc decimal(38,18), bo boolean, s string, s2 string, ts timestamp, ts2 timestamp, dt date) stored by iceberg stored as orc; insert into vectortab10k select * from vector1k; select count(*) from vectortab10k ; create table partition_transform_4(t int, ts timestamp) partitioned by spec(month(ts)) stored by iceberg; insert into table partition_transform_4 select t, ts from vectortab10k ; {noformat} was: Insert into partitioned table fails with the following error if the data is not clustered. *Using cluster by clause it succeeds :* {noformat} 0: jdbc:hive2://localhost:10001/> insert into table partition_transform_4 select t, ts from t1 cluster by ts; -
[jira] [Commented] (HIVE-28107) Remove the docs directory
[ https://issues.apache.org/jira/browse/HIVE-28107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824325#comment-17824325 ] Simhadri Govindappa commented on HIVE-28107: Thanks [~zabetak] , I will update the PR. > Remove the docs directory > - > > Key: HIVE-28107 > URL: https://issues.apache.org/jira/browse/HIVE-28107 > Project: Hive > Issue Type: Task > Components: Hive >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Fix For: Not Applicable > > > The doc directory was used to host the old hive website. > Since the revamped hive website was moved to and hosted from > [https://github.com/apache/hive-site/] for almost a year now without any > issues, this docs directory in the main repo is no longer required. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-28107) Remove the docs directory
[ https://issues.apache.org/jira/browse/HIVE-28107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824314#comment-17824314 ] Simhadri Govindappa edited comment on HIVE-28107 at 3/7/24 8:51 AM: Hi [~zabetak] , Yes this seems to be a duplicate. Sorry, I was not aware of HIVE-27953 . Shall I mark this Jira as duplicate and close the PR? was (Author: simhadri-g): Hi [~zabetak] , Yes this seems to be a duplicate, I was not aware of HIVE-27953 . Shall I mark this Jira as duplicate and close the PR? > Remove the docs directory > - > > Key: HIVE-28107 > URL: https://issues.apache.org/jira/browse/HIVE-28107 > Project: Hive > Issue Type: Task > Components: Hive >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > > The doc directory was used to host the old hive website. > Since the revamped hive website was moved to and hosted from > [https://github.com/apache/hive-site/] for almost a year now without any > issues, this docs directory in the main repo is no longer required. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28107) Remove the docs directory
[ https://issues.apache.org/jira/browse/HIVE-28107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824314#comment-17824314 ] Simhadri Govindappa commented on HIVE-28107: Hi [~zabetak] , Yes this seems to be a duplicate, I was not aware of HIVE-27953 . Shall I mark this Jira as duplicate and close the PR? > Remove the docs directory > - > > Key: HIVE-28107 > URL: https://issues.apache.org/jira/browse/HIVE-28107 > Project: Hive > Issue Type: Task > Components: Hive >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > > The doc directory was used to host the old hive website. > Since the revamped hive website was moved to and hosted from > [https://github.com/apache/hive-site/] for almost a year now without any > issues, this docs directory in the main repo is no longer required. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28107) Remove the docs directory
Simhadri Govindappa created HIVE-28107: -- Summary: Remove the docs directory Key: HIVE-28107 URL: https://issues.apache.org/jira/browse/HIVE-28107 Project: Hive Issue Type: Task Components: Hive Reporter: Simhadri Govindappa Assignee: Simhadri Govindappa The doc directory was used to host the old hive website. Since the revamped hive website was moved to and hosted from [https://github.com/apache/hive-site/] for almost a year now without any issues, this docs directory in the main repo is no longer required. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28087) Hive Iceberg: Insert into partitioned table fails if the data is not clustered
[ https://issues.apache.org/jira/browse/HIVE-28087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa updated HIVE-28087: --- Description: Insert into partitioned table fails with the following error if the data is not clustered. *Using cluster by clause it succeeds :* {noformat} 0: jdbc:hive2://localhost:10001/> insert into table partition_transform_4 select t, ts from t1 cluster by ts; -- VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -- Map 1 .. container SUCCEEDED 1 100 0 0 Reducer 2 .. container SUCCEEDED 1 100 0 0 -- VERTICES: 02/02 [==>>] 100% ELAPSED TIME: 9.47 s -- INFO : Starting task [Stage-2:DEPENDENCY_COLLECTION] in serial mode INFO : Starting task [Stage-0:MOVE] in serial mode INFO : Completed executing command(queryId=root_20240222123244_0c448b32-4fd9-420d-be31-e39e2972af82); Time taken: 10.534 seconds 100 rows affected (10.696 seconds){noformat} *Without cluster By it fails:* {noformat} 0: jdbc:hive2://localhost:10001/> insert into table partition_transform_4 select t, ts from t1; -- VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -- Map 1 .. container SUCCEEDED 1 100 0 0 Reducer 2container RUNNING 1 010 2 0 -- VERTICES: 01/02 [=>>-] 50% ELAPSED TIME: 9.53 s -- Caused by: java.lang.IllegalStateException: Incoming records violate the writer assumption that records are clustered by spec and by partition within each spec. Either cluster the incoming records or switch to fanout writers. Encountered records that belong to already closed files: partition 'ts_month=2027-03' in spec [ 1000: ts_month: month(2) ] at org.apache.iceberg.io.ClusteredWriter.write(ClusteredWriter.java:96) at org.apache.iceberg.io.ClusteredDataWriter.write(ClusteredDataWriter.java:31) at org.apache.iceberg.mr.hive.writer.HiveIcebergRecordWriter.write(HiveIcebergRecordWriter.java:53) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1181) at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111) at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:502) ... 20 more{noformat} A simple repro, using the attached csv file: [^query-hive-377.csv] {noformat} create database t3; use t3; create table vector1k( t int, si int, i int, b bigint, f float, d double, dc decimal(38,18), bo boolean, s string, s2 string, ts timestamp, ts2 timestamp, dt date) row format delimited fields terminated by ','; load data local inpath "/query-hive-377.csv" OVERWRITE into table vector1k; select * from vector1k; create table vectortab10k( t int, si int, i int, b bigint, f float, d double, dc decimal(38,18), bo boolean, s string, s2 string, ts timestamp, ts2 timestamp, dt date) stored by iceberg stored as orc; insert into vectortab10k select * from vector1k;select count(*) from vectortab10k limit 10; create table partition_transform_4(t int, ts timestamp) partitioned by spec(month(ts)) stored by iceberg; insert into table partition_transform_4 select t, ts from vectortab10k ; {noformat} was: Insert into partitioned table fails with the following error if the data is not clustered. {noformat} Caused by: java.lang.IllegalStateException: Incoming records violate the writer assumption that records are clustered by spec and by partition within each spec.
[jira] [Created] (HIVE-28087) Hive Iceberg: Insert into partitioned table fails if the data is not clustered
Simhadri Govindappa created HIVE-28087: -- Summary: Hive Iceberg: Insert into partitioned table fails if the data is not clustered Key: HIVE-28087 URL: https://issues.apache.org/jira/browse/HIVE-28087 Project: Hive Issue Type: Task Reporter: Simhadri Govindappa Assignee: Simhadri Govindappa Attachments: query-hive-377.csv Insert into partitioned table fails with the following error if the data is not clustered. {noformat} Caused by: java.lang.IllegalStateException: Incoming records violate the writer assumption that records are clustered by spec and by partition within each spec. Either cluster the incoming records or switch to fanout writers. Encountered records that belong to already closed files: partition 'ts_month=2027-03' in spec [ 1000: ts_month: month(2) ] at org.apache.iceberg.io.ClusteredWriter.write(ClusteredWriter.java:96) at org.apache.iceberg.io.ClusteredDataWriter.write(ClusteredDataWriter.java:31) at org.apache.iceberg.mr.hive.writer.HiveIcebergRecordWriter.write(HiveIcebergRecordWriter.java:53) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1181) at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111) at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:502) ... 20 more{noformat} A simple repro, using the attached csv file: [^query-hive-377.csv] {noformat} create database t3; use t3; create table vector1k( t int, si int, i int, b bigint, f float, d double, dc decimal(38,18), bo boolean, s string, s2 string, ts timestamp, ts2 timestamp, dt date) row format delimited fields terminated by ','; load data local inpath "/query-hive-377.csv" OVERWRITE into table vector1k; select * from vector1k; create table vectortab10k( t int, si int, i int, b bigint, f float, d double, dc decimal(38,18), bo boolean, s string, s2 string, ts timestamp, ts2 timestamp, dt date) stored by iceberg stored as orc; insert into vectortab10k select * from vector1k;select count(*) from vectortab10k limit 10; create table partition_transform_4(t int, ts timestamp) partitioned by spec(month(ts)) stored by iceberg; insert into table partition_transform_4 select t, ts from vectortab10k ; {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-28048) Hive cannot run ORDER BY queries on Iceberg tables partitioned by decimal columns
[ https://issues.apache.org/jira/browse/HIVE-28048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812500#comment-17812500 ] Simhadri Govindappa edited comment on HIVE-28048 at 1/30/24 10:36 PM: -- This seems to be the same as https://issues.apache.org/jira/browse/HIVE-27938 I have a PR for this, please help with the review: [https://github.com/apache/hive/pull/5048] Thanks! was (Author: simhadri-g): This seems to be the same as https://issues.apache.org/jira/browse/HIVE-27938 I have a PR up for review. [https://github.com/apache/hive/pull/5048] > Hive cannot run ORDER BY queries on Iceberg tables partitioned by decimal > columns > - > > Key: HIVE-28048 > URL: https://issues.apache.org/jira/browse/HIVE-28048 > Project: Hive > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Priority: Major > Labels: iceberg > > Repro: > {noformat} > create table test_dec (d decimal(8,4), i int) > partitioned by spec (d) > stored by iceberg; > insert into test_dec values (3.4, 5), (4.5, 6); > select * from test_dec order by i; > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28048) Hive cannot run ORDER BY queries on Iceberg tables partitioned by decimal columns
[ https://issues.apache.org/jira/browse/HIVE-28048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812500#comment-17812500 ] Simhadri Govindappa commented on HIVE-28048: This seems to be the same as https://issues.apache.org/jira/browse/HIVE-27938 I have a PR up for review. [https://github.com/apache/hive/pull/5048] > Hive cannot run ORDER BY queries on Iceberg tables partitioned by decimal > columns > - > > Key: HIVE-28048 > URL: https://issues.apache.org/jira/browse/HIVE-28048 > Project: Hive > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Priority: Major > Labels: iceberg > > Repro: > {noformat} > create table test_dec (d decimal(8,4), i int) > partitioned by spec (d) > stored by iceberg; > insert into test_dec values (3.4, 5), (4.5, 6); > select * from test_dec order by i; > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27938) Iceberg: Fix java.lang.ClassCastException during vectorized reads on partition columns
[ https://issues.apache.org/jira/browse/HIVE-27938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa updated HIVE-27938: --- Summary: Iceberg: Fix java.lang.ClassCastException during vectorized reads on partition columns (was: Iceberg: Date type Partitioned column throws java.lang.ClassCastException: java.time.LocalDate cannot be cast to org.apache.hadoop.hive.common.type.Date) > Iceberg: Fix java.lang.ClassCastException during vectorized reads on > partition columns > --- > > Key: HIVE-27938 > URL: https://issues.apache.org/jira/browse/HIVE-27938 > Project: Hive > Issue Type: Bug >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > > {code:java} > 1: jdbc:hive2://localhost:10001/> CREATE EXTERNAL TABLE ice3 (`col1` int, > `calday` date) PARTITIONED BY SPEC (calday) stored by iceberg > tblproperties('format-version'='2'); > 1: jdbc:hive2://localhost:10001/>insert into ice3 values(1, '2020-11-20'); > 1: jdbc:hive2://localhost:10001/> select count(calday) from ice3; > {code} > Full stack trace: > {code:java} > INFO : Compiling > command(queryId=root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feab): > select count(calday) from ice3INFO : No Stats for default@ice3, Columns: > caldayINFO : Semantic Analysis Completed (retrial = false)INFO : Created > Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, > comment:null)], properties:null)INFO : Completed compiling > command(queryId=root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feab); > Time taken: 0.196 secondsINFO : Operation QUERY obtained 0 locksINFO : > Executing > command(queryId=root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feab): > select count(calday) from ice3INFO : Query ID = > root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feabINFO : Total jobs = > 1INFO : Launching Job 1 out of 1INFO : Starting task [Stage-1:MAPRED] in > serial modeINFO : Subscribed to counters: [] for queryId: > root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feabINFO : Session is > already openINFO : Dag name: select count(calday) from ice3 (Stage-1)INFO : > HS2 Host: [localhost], Query ID: > [root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feab], Dag ID: > [dag_1701888162260_0001_2], DAG Session ID: > [application_1701888162260_0001]INFO : Status: Running (Executing on YARN > cluster with App id application_1701888162260_0001) > -- > VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING > FAILED > KILLED--Map > 1 container RUNNING 1 0 0 1 > 4 0Reducer 2 container INITED 1 0 > 0 1 0 > 0--VERTICES: > 00/02 [>>--] 0% ELAPSED TIME: 1.41 > s--ERROR > : Status: FailedERROR : Vertex failed, vertexName=Map 1, > vertexId=vertex_1701888162260_0001_2_00, diagnostics=[Task failed, > taskId=task_1701888162260_0001_2_00_00, diagnostics=[TaskAttempt 0 > failed, info=[Error: Error while running task ( failure ) : > attempt_1701888162260_0001_2_00_00_0:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.lang.ClassCastException: java.time.LocalDate cannot be cast to > org.apache.hadoop.hive.common.type.Dateat > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) >at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) >at java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent
[jira] [Commented] (HIVE-27938) Iceberg: Date type Partitioned column throws java.lang.ClassCastException: java.time.LocalDate cannot be cast to org.apache.hadoop.hive.common.type.Date
[ https://issues.apache.org/jira/browse/HIVE-27938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812108#comment-17812108 ] Simhadri Govindappa commented on HIVE-27938: The error is also present for DATE and DECIMAL columns > Iceberg: Date type Partitioned column throws java.lang.ClassCastException: > java.time.LocalDate cannot be cast to org.apache.hadoop.hive.common.type.Date > > > Key: HIVE-27938 > URL: https://issues.apache.org/jira/browse/HIVE-27938 > Project: Hive > Issue Type: Bug >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > > {code:java} > 1: jdbc:hive2://localhost:10001/> CREATE EXTERNAL TABLE ice3 (`col1` int, > `calday` date) PARTITIONED BY SPEC (calday) stored by iceberg > tblproperties('format-version'='2'); > 1: jdbc:hive2://localhost:10001/>insert into ice3 values(1, '2020-11-20'); > 1: jdbc:hive2://localhost:10001/> select count(calday) from ice3; > {code} > Full stack trace: > {code:java} > INFO : Compiling > command(queryId=root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feab): > select count(calday) from ice3INFO : No Stats for default@ice3, Columns: > caldayINFO : Semantic Analysis Completed (retrial = false)INFO : Created > Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, > comment:null)], properties:null)INFO : Completed compiling > command(queryId=root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feab); > Time taken: 0.196 secondsINFO : Operation QUERY obtained 0 locksINFO : > Executing > command(queryId=root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feab): > select count(calday) from ice3INFO : Query ID = > root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feabINFO : Total jobs = > 1INFO : Launching Job 1 out of 1INFO : Starting task [Stage-1:MAPRED] in > serial modeINFO : Subscribed to counters: [] for queryId: > root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feabINFO : Session is > already openINFO : Dag name: select count(calday) from ice3 (Stage-1)INFO : > HS2 Host: [localhost], Query ID: > [root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feab], Dag ID: > [dag_1701888162260_0001_2], DAG Session ID: > [application_1701888162260_0001]INFO : Status: Running (Executing on YARN > cluster with App id application_1701888162260_0001) > -- > VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING > FAILED > KILLED--Map > 1 container RUNNING 1 0 0 1 > 4 0Reducer 2 container INITED 1 0 > 0 1 0 > 0--VERTICES: > 00/02 [>>--] 0% ELAPSED TIME: 1.41 > s--ERROR > : Status: FailedERROR : Vertex failed, vertexName=Map 1, > vertexId=vertex_1701888162260_0001_2_00, diagnostics=[Task failed, > taskId=task_1701888162260_0001_2_00_00, diagnostics=[TaskAttempt 0 > failed, info=[Error: Error while running task ( failure ) : > attempt_1701888162260_0001_2_00_00_0:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.lang.ClassCastException: java.time.LocalDate cannot be cast to > org.apache.hadoop.hive.common.type.Dateat > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) >at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) >at java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutu
[jira] [Comment Edited] (HIVE-27929) Run TPC-DS queries and validate results correctness
[ https://issues.apache.org/jira/browse/HIVE-27929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17811790#comment-17811790 ] Simhadri Govindappa edited comment on HIVE-27929 at 1/29/24 9:36 AM: - I was able to run a 1tb tpcds run hive master, with the following versions: # Hive - master (last commit from 9th of jan) # Hadoop- 3.3.6 # Tez- 0.10.2 (with a patch to remove the conflicting hadoop-client jar from classpath) With these versions, * *ORC External:* I was able to run all the tpcds queries successfully. * *ORC manager:* Faced the same issue described above. was (Author: simhadri-g): I was able to run a 1tb tpcds run hive master, with the following versions: # Hive - master (last commit from 9th of jan) # Hadoop- 3.3.6 # Tez- 0.10.2 (with a patch to remove the conflicting hadoop-client jar from classpath) With these versions, * *ORC External:* I was able to run all the tpcds queries successfully. * *ORC manager:* Faced the same issue described above. ( HIVE-28004 ) > Run TPC-DS queries and validate results correctness > --- > > Key: HIVE-27929 > URL: https://issues.apache.org/jira/browse/HIVE-27929 > Project: Hive > Issue Type: Sub-task >Reporter: Denys Kuzmenko >Assignee: Simhadri Govindappa >Priority: Major > > release branch: *branch-4.0* > https://github.com/apache/hive/tree/branch-4.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-27929) Run TPC-DS queries and validate results correctness
[ https://issues.apache.org/jira/browse/HIVE-27929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17811790#comment-17811790 ] Simhadri Govindappa edited comment on HIVE-27929 at 1/29/24 9:35 AM: - I was able to run a 1tb tpcds run hive master, with the following versions: # Hive - master (last commit from 9th of jan) # Hadoop- 3.3.6 # Tez- 0.10.2 (with a patch to remove the conflicting hadoop-client jar from classpath) With these versions, * *ORC External:* I was able to run all the tpcds queries successfully. * *ORC manager:* Faced the same issue described above. ( HIVE-28004 ) was (Author: simhadri-g): I was able to run a 1tb tpcds run hive master, with the following versions: # Hive - master (last commit from 9th of jan) # Hadoop- 3.3.6 # Tez- 0.10.2 (with a patch to remove the conflicting hadoop-client jar from classpath) With these versions, * *ORC External:* I was able to run all the tpcds queries successfully. * *ORC manager:* Faced the same issue described above. ( HIVE-28004 ) > Run TPC-DS queries and validate results correctness > --- > > Key: HIVE-27929 > URL: https://issues.apache.org/jira/browse/HIVE-27929 > Project: Hive > Issue Type: Sub-task >Reporter: Denys Kuzmenko >Assignee: Simhadri Govindappa >Priority: Major > > release branch: *branch-4.0* > https://github.com/apache/hive/tree/branch-4.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27929) Run TPC-DS queries and validate results correctness
[ https://issues.apache.org/jira/browse/HIVE-27929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17811790#comment-17811790 ] Simhadri Govindappa commented on HIVE-27929: I was able to run a 1tb tpcds run hive master, with the following versions: # Hive - master (last commit from 9th of jan) # Hadoop- 3.3.6 # Tez- 0.10.2 (with a patch to remove the conflicting hadoop-client jar from classpath) With these versions, * *ORC External:* I was able to run all the tpcds queries successfully. * *ORC manager:* Faced the same issue described above. ( HIVE-28004 ) > Run TPC-DS queries and validate results correctness > --- > > Key: HIVE-27929 > URL: https://issues.apache.org/jira/browse/HIVE-27929 > Project: Hive > Issue Type: Sub-task >Reporter: Denys Kuzmenko >Assignee: Simhadri Govindappa >Priority: Major > > release branch: *branch-4.0* > https://github.com/apache/hive/tree/branch-4.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28020) Iceberg: Upgrade iceberg version to 1.4.3
Simhadri Govindappa created HIVE-28020: -- Summary: Iceberg: Upgrade iceberg version to 1.4.3 Key: HIVE-28020 URL: https://issues.apache.org/jira/browse/HIVE-28020 Project: Hive Issue Type: Task Reporter: Simhadri Govindappa Assignee: Simhadri Govindappa Iceberg version 1.4.3 has been released. [https://github.com/apache/iceberg/releases/tag/apache-iceberg-1.4.3] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27938) Iceberg: Date type Partitioned column throws java.lang.ClassCastException: java.time.LocalDate cannot be cast to org.apache.hadoop.hive.common.type.Date
[ https://issues.apache.org/jira/browse/HIVE-27938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17793902#comment-17793902 ] Simhadri Govindappa commented on HIVE-27938: The query works fine when vectorization is disabled: {noformat} *0: jdbc:hive2://sg-hive-1.sg-hive.root.hwx.si> set hive.vectorized.execution.enabled; +-+ | set | +-+ | hive.vectorized.execution.enabled=true | +-+ 1 row selected (0.014 seconds) 0: jdbc:hive2://sg-hive-1.sg-hive.root.hwx.si> set hive.vectorized.execution.enabled=false; No rows affected (0.008 seconds) 0: jdbc:hive2://sg-hive-1.sg-hive.root.hwx.si> set hive.vectorized.execution.enabled; +--+ | set | +--+ | hive.vectorized.execution.enabled=false | +--+ 1 row selected (0.009 seconds) 0: jdbc:hive2://sg-hive-1.sg-hive.root.hwx.si> select count(calday) from ice3; INFO : Compiling command(queryId=hive_20231206190825_72fc4b03-bfb6-4b61-a421-64a809b46781): select count(calday) from ice3 INFO : No Stats for default@ice3, Columns: calday INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20231206190825_72fc4b03-bfb6-4b61-a421-64a809b46781); Time taken: 0.108 seconds INFO : Executing command(queryId=hive_20231206190825_72fc4b03-bfb6-4b61-a421-64a809b46781): select count(calday) from ice3 INFO : Query ID = hive_20231206190825_72fc4b03-bfb6-4b61-a421-64a809b46781 INFO : Total jobs = 1 INFO : Launching Job 1 out of 1 INFO : Starting task [Stage-1:MAPRED] in serial mode INFO : Subscribed to counters: [] for queryId: hive_20231206190825_72fc4b03-bfb6-4b61-a421-64a809b46781 INFO : Session is already open INFO : Dag name: select count(calday) from ice3 (Stage-1) INFO : HS2 Host: [sg-hive-1.sg-hive.root.hwx.site], Query ID: [hive_20231206190825_72fc4b03-bfb6-4b61-a421-64a809b46781], Dag ID: [dag_1700588079029_0015_3], DAG Session ID: [application_1700588079029_0015] INFO : Status: Running (Executing on YARN cluster with App id application_1700588079029_0015)-- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -- Map 1 .. container SUCCEEDED 1 1 0 0 0 0 Reducer 2 .. container SUCCEEDED 1 1 0 0 0 0 -- VERTICES: 02/02 [==>>] 100% ELAPSED TIME: 4.11 s -- INFO : Status: DAG finished successfully in 4.08 seconds INFO : DAG ID: dag_1700588079029_0015_3 INFO : INFO : Query Execution Summary INFO : -- INFO : OPERATION DURATION INFO : -- INFO : Compile Query 0.11s INFO : Prepare Plan 0.04s INFO : Get Query Coordinator (AM) 0.00s INFO : Submit Plan 0.02s INFO : Start DAG 0.08s INFO : Run DAG 4.08s INFO : -- INFO : INFO : Task Execution Summary INFO : -- INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS INFO : -- INFO : Map 1 2012.00 4,030 123 1 1 INFO : Reducer 2 41.00 440 19 1 0 INFO : -- INFO : INFO : org.apache.tez.common.counters.DAGCounter: INFO : NUM_SUCCEEDED_TASKS: 2 INFO : TOTAL_LAUNCHED_TASKS: 2 INFO : RACK_LOCAL_TASKS: 1 INFO : AM_CPU_MILLISECONDS: 530 INFO : AM_GC_TIME_MILLIS: 28 INFO : INITIAL_HELD_CONTAINERS: 0
[jira] [Created] (HIVE-27938) Iceberg: Date type Partitioned column throws java.lang.ClassCastException: java.time.LocalDate cannot be cast to org.apache.hadoop.hive.common.type.Date
Simhadri Govindappa created HIVE-27938: -- Summary: Iceberg: Date type Partitioned column throws java.lang.ClassCastException: java.time.LocalDate cannot be cast to org.apache.hadoop.hive.common.type.Date Key: HIVE-27938 URL: https://issues.apache.org/jira/browse/HIVE-27938 Project: Hive Issue Type: Bug Reporter: Simhadri Govindappa Assignee: Simhadri Govindappa {code:java} 1: jdbc:hive2://localhost:10001/> CREATE EXTERNAL TABLE ice3 (`col1` int, `calday` date) PARTITIONED BY SPEC (calday) stored by iceberg tblproperties('format-version'='2'); 1: jdbc:hive2://localhost:10001/>insert into ice3 values(1, '2020-11-20'); 1: jdbc:hive2://localhost:10001/> select count(calday) from ice3; {code} Full stack trace: {code:java} INFO : Compiling command(queryId=root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feab): select count(calday) from ice3INFO : No Stats for default@ice3, Columns: caldayINFO : Semantic Analysis Completed (retrial = false)INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], properties:null)INFO : Completed compiling command(queryId=root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feab); Time taken: 0.196 secondsINFO : Operation QUERY obtained 0 locksINFO : Executing command(queryId=root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feab): select count(calday) from ice3INFO : Query ID = root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feabINFO : Total jobs = 1INFO : Launching Job 1 out of 1INFO : Starting task [Stage-1:MAPRED] in serial modeINFO : Subscribed to counters: [] for queryId: root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feabINFO : Session is already openINFO : Dag name: select count(calday) from ice3 (Stage-1)INFO : HS2 Host: [localhost], Query ID: [root_20231206184246_e8da1539-7537-45fe-af67-4c7ba219feab], Dag ID: [dag_1701888162260_0001_2], DAG Session ID: [application_1701888162260_0001]INFO : Status: Running (Executing on YARN cluster with App id application_1701888162260_0001) -- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED--Map 1 container RUNNING 1 0 0 1 4 0Reducer 2 container INITED 1 0 0 1 0 0--VERTICES: 00/02 [>>--] 0% ELAPSED TIME: 1.41 s--ERROR : Status: FailedERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1701888162260_0001_2_00, diagnostics=[Task failed, taskId=task_1701888162260_0001_2_00_00, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1701888162260_0001_2_00_00_0:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.ClassCastException: java.time.LocalDate cannot be cast to org.apache.hadoop.hive.common.type.Date at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)Caused by: org.apache.h
[jira] [Commented] (HIVE-26673) Incorrect row count when vectorisation is enabled
[ https://issues.apache.org/jira/browse/HIVE-26673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17793744#comment-17793744 ] Simhadri Govindappa commented on HIVE-26673: No. This issue is fixed after HIVE-25142 . > Incorrect row count when vectorisation is enabled > - > > Key: HIVE-26673 > URL: https://issues.apache.org/jira/browse/HIVE-26673 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0-alpha-2 >Reporter: Simhadri Govindappa >Priority: Major > > Repro: > {noformat} > select count(*) from > (SELECT T0.plant_no, > T0.part_chain, > T0.part_new, > T0.part_no > FROM dm_ads_dims_prod.cloudera_test3 T0 > LEFT JOIN > (SELECT T0.plant_no, > T0.part_chain > FROM > (SELECT T0.plant_no, > T0.part_chain, > count( *) AS ct > FROM dm_ads_dims_prod.cloudera_test3 T0 > WHERE purchase_pos = pos > GROUP BY T0.plant_no, > T0.part_chain) T0 > WHERE ct = 2 ) T1 ON T0.plant_no = T1.plant_no > AND T0.part_chain = T1.part_chain > WHERE T0.purchase_pos = T0.pos > AND (T1.part_chain IS NULL > OR (T1.part_chain IS NOT NULL > AND T0.fd = 1)) ) s; > {noformat} > Run the query with the following settings on the repro cluster a few times > {code:java} > set hive.query.results.cache.enabled=false; > set hive.compute.query.using.stats=false; > set hive.auto.convert.join=true; > {code} > and the results was > {code:java} > 2682424 > 2682426 > 2682425{code} > > Then turn off {{hive.auto.convert.join}} > {code:java} > set hive.query.results.cache.enabled=false; > set hive.compute.query.using.stats=false; > set hive.auto.convert.join=false; > {code} > and the result was always *2682420* > Analyzing the plans with hive.auto.convert.join enabled vs disabled, the > difference is the type of join Map vs Merge. > Additionally, vectorization also plays a role when turned off the result > became good: > {code:java} > SET hive.vectorized.execution.enabled=false; > {code} > It is also just a workaround and has negative impact on performance this > should help us narrow down where to find the cause of the issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-26673) Incorrect row count when vectorisation is enabled
[ https://issues.apache.org/jira/browse/HIVE-26673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa resolved HIVE-26673. Fix Version/s: 4.0.0 Resolution: Fixed > Incorrect row count when vectorisation is enabled > - > > Key: HIVE-26673 > URL: https://issues.apache.org/jira/browse/HIVE-26673 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0-alpha-2 >Reporter: Simhadri Govindappa >Priority: Major > Fix For: 4.0.0 > > > Repro: > {noformat} > select count(*) from > (SELECT T0.plant_no, > T0.part_chain, > T0.part_new, > T0.part_no > FROM dm_ads_dims_prod.cloudera_test3 T0 > LEFT JOIN > (SELECT T0.plant_no, > T0.part_chain > FROM > (SELECT T0.plant_no, > T0.part_chain, > count( *) AS ct > FROM dm_ads_dims_prod.cloudera_test3 T0 > WHERE purchase_pos = pos > GROUP BY T0.plant_no, > T0.part_chain) T0 > WHERE ct = 2 ) T1 ON T0.plant_no = T1.plant_no > AND T0.part_chain = T1.part_chain > WHERE T0.purchase_pos = T0.pos > AND (T1.part_chain IS NULL > OR (T1.part_chain IS NOT NULL > AND T0.fd = 1)) ) s; > {noformat} > Run the query with the following settings on the repro cluster a few times > {code:java} > set hive.query.results.cache.enabled=false; > set hive.compute.query.using.stats=false; > set hive.auto.convert.join=true; > {code} > and the results was > {code:java} > 2682424 > 2682426 > 2682425{code} > > Then turn off {{hive.auto.convert.join}} > {code:java} > set hive.query.results.cache.enabled=false; > set hive.compute.query.using.stats=false; > set hive.auto.convert.join=false; > {code} > and the result was always *2682420* > Analyzing the plans with hive.auto.convert.join enabled vs disabled, the > difference is the type of join Map vs Merge. > Additionally, vectorization also plays a role when turned off the result > became good: > {code:java} > SET hive.vectorized.execution.enabled=false; > {code} > It is also just a workaround and has negative impact on performance this > should help us narrow down where to find the cause of the issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27772) UNIX_TIMESTAMP should return NULL when date fields are out of bounds
[ https://issues.apache.org/jira/browse/HIVE-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa updated HIVE-27772: --- Description: For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is giving out the timestamp value as the last valid date, rather than NULL. (e.g. UNIX_TIMESTAMP('2001-02-31', '-MM-dd') gives 983354400, which converts to '2001-02-28'. However, for calendar days larger than 31, e.g. 2001-02-32, or 2023-04-32, UNIX_TIMESTAMP() would give NULL as a result. In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or 0). {noformat} 6: jdbc:hive2://localhost:10001/> select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetimetable; INFO : Compiling command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetimetable INFO : No Stats for default@datetimetable, Columns: month, datetimestamp INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, type:string, comment:null), FieldSchema(name:datetimestamp, type:string, comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)], properties:null) INFO : Completed compiling command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time taken: 0.102 seconds INFO : Operation QUERY obtained 0 locks INFO : Executing command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetimetable INFO : Completed executing command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time taken: 0.0 seconds +++---+ | month | datetimestamp | timestampcol | +++---+ | Feb | 2001-02-28 | 983318400 | | Feb | 2001-02-29 | 983318400 | | Feb | 2001-02-30 | 983318400 | | Feb | 2001-02-31 | 983318400 | | Feb | 2001-02-32 | NULL | +++---+ 5 rows selected (0.131 seconds){noformat} According to java jdk : [https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/java/time/format/ResolverStyle.java#L103] {noformat} /** * Style to resolve dates and times strictly. * * Using strict resolution will ensure that all parsed values are within * the outer range of valid values for the field. Individual fields may * be further processed for strictness. * * For example, resolving year-month and day-of-month in the ISO calendar * system using strict mode will ensure that the day-of-month is valid * for the year-month, rejecting invalid values. */ STRICT, /** * Style to resolve dates and times in a smart, or intelligent, manner. * * Using smart resolution will perform the sensible default for each * field, which may be the same as strict, the same as lenient, or a third * behavior. Individual fields will interpret this differently. * * For example, resolving year-month and day-of-month in the ISO calendar * system using smart mode will ensure that the day-of-month is from * 1 to 31, converting any value beyond the last valid day-of-month to be * the last valid day-of-month. */ SMART,{noformat} By default, the DATETIME formatter uses the SMART resolution style and the SIMPLE formatter the LENIENT. Both of these styles are able to resolve "invalid" bounds to valid dates. In order to prevent seemingly "invalid" dates to be parsed correctly we have to use the STRICT resolution style. However, we cannot simply switch the formatters to always use the STRICT resolution cause that would break existing applications relying on the existing resolution rules. To address the problem reported here and retain the previous behaviour we opted to make the resolution style configurable by adding a new property. The new property only affects the DATETIME formatter; the SIMPLE formatter is almost deprecated so we don't add new features to it. was: For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is giving out the timestamp value as the last valid date, rather than NULL. (e.g. UNIX_TIMESTAMP('2001-02-31', '-MM-dd') gives 983354400, which converts to '2001-02-28'. However, for calendar days larger than 31, e.g. 2001-02-32, or 2023-04-32, UNIX_TIMESTAMP() would give NULL as a result. In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or 0). {noformat} 6: jdbc:hive2://localhost:10001/> select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetime
[jira] [Commented] (HIVE-27772) UNIX_TIMESTAMP should return NULL when date fields are out of bounds
[ https://issues.apache.org/jira/browse/HIVE-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775767#comment-17775767 ] Simhadri Govindappa commented on HIVE-27772: Thanks [~zabetak] for the review. I will update the wiki and the Jira description. > UNIX_TIMESTAMP should return NULL when date fields are out of bounds > > > Key: HIVE-27772 > URL: https://issues.apache.org/jira/browse/HIVE-27772 > Project: Hive > Issue Type: Bug >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is > giving out the timestamp value as the last valid date, rather than NULL. > (e.g. UNIX_TIMESTAMP('2001-02-31', '-MM-dd') gives 983354400, which > converts to '2001-02-28'. However, for calendar days larger than 31, e.g. > 2001-02-32, or 2023-04-32, UNIX_TIMESTAMP() would give NULL as a result. > In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or > 0). > > {noformat} > 6: jdbc:hive2://localhost:10001/> select month, datetimestamp, > unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from > datetimetable; > INFO : Compiling > command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): > select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as > timestampCol from datetimetable > INFO : No Stats for default@datetimetable, Columns: month, datetimestamp > INFO : Semantic Analysis Completed (retrial = false) > INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, > type:string, comment:null), FieldSchema(name:datetimestamp, type:string, > comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)], > properties:null) > INFO : Completed compiling > command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); > Time taken: 0.102 seconds > INFO : Operation QUERY obtained 0 locks > INFO : Executing > command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): > select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as > timestampCol from datetimetable > INFO : Completed executing > command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); > Time taken: 0.0 seconds > +++---+ > | month | datetimestamp | timestampcol | > +++---+ > | Feb | 2001-02-28 | 983318400 | > | Feb | 2001-02-29 | 983318400 | > | Feb | 2001-02-30 | 983318400 | > | Feb | 2001-02-31 | 983318400 | > | Feb | 2001-02-32 | NULL | > +++---+ > 5 rows selected (0.131 seconds){noformat} > > > It looks like > [InstantDateTimeFormatter.java#L52|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/InstantDateTimeFormatter.java#L52] > by default, the formatter has the SMART resolver style. > According to java jdk : > https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/java/time/format/ResolverStyle.java#L103 > > {noformat} > /** > * Style to resolve dates and times strictly. > * > * Using strict resolution will ensure that all parsed values are within > * the outer range of valid values for the field. Individual fields may > * be further processed for strictness. > * > * For example, resolving year-month and day-of-month in the ISO calendar > * system using strict mode will ensure that the day-of-month is valid > * for the year-month, rejecting invalid values. > */ > STRICT, > /** > * Style to resolve dates and times in a smart, or intelligent, manner. > * > * Using smart resolution will perform the sensible default for each > * field, which may be the same as strict, the same as lenient, or a third > * behavior. Individual fields will interpret this differently. > * > * For example, resolving year-month and day-of-month in the ISO calendar > * system using smart mode will ensure that the day-of-month is from > * 1 to 31, converting any value beyond the last valid day-of-month to be > * the last valid day-of-month. > */ > SMART,{noformat} > > > Therefore, we should set the resolverStyle to STRICT to reject invalid date > values. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27772) Hive UNIX_TIMESTAMP()should return null for invalid dates
[ https://issues.apache.org/jira/browse/HIVE-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa updated HIVE-27772: --- Summary: Hive UNIX_TIMESTAMP()should return null for invalid dates (was: Hive UNIX_TIMESTAMP() not returning null for invalid dates) > Hive UNIX_TIMESTAMP()should return null for invalid dates > - > > Key: HIVE-27772 > URL: https://issues.apache.org/jira/browse/HIVE-27772 > Project: Hive > Issue Type: Bug >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > > For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is > giving out the timestamp value as the last valid date, rather than NULL. > (e.g. UNIX_TIMESTAMP('2001-02-31', '-MM-dd') gives 983354400, which > converts to '2001-02-28'. However, for calendar days larger than 31, e.g. > 2001-02-32, or 2023-04-32, UNIX_TIMESTAMP() would give NULL as a result. > In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or > 0). > > {noformat} > 6: jdbc:hive2://localhost:10001/> select month, datetimestamp, > unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from > datetimetable; > INFO : Compiling > command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): > select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as > timestampCol from datetimetable > INFO : No Stats for default@datetimetable, Columns: month, datetimestamp > INFO : Semantic Analysis Completed (retrial = false) > INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, > type:string, comment:null), FieldSchema(name:datetimestamp, type:string, > comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)], > properties:null) > INFO : Completed compiling > command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); > Time taken: 0.102 seconds > INFO : Operation QUERY obtained 0 locks > INFO : Executing > command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): > select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as > timestampCol from datetimetable > INFO : Completed executing > command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); > Time taken: 0.0 seconds > +++---+ > | month | datetimestamp | timestampcol | > +++---+ > | Feb | 2001-02-28 | 983318400 | > | Feb | 2001-02-29 | 983318400 | > | Feb | 2001-02-30 | 983318400 | > | Feb | 2001-02-31 | 983318400 | > | Feb | 2001-02-32 | NULL | > +++---+ > 5 rows selected (0.131 seconds){noformat} > > > It looks like > [InstantDateTimeFormatter.java#L52|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/InstantDateTimeFormatter.java#L52] > by default, the formatter has the SMART resolver style. > According to java jdk : > https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/java/time/format/ResolverStyle.java#L103 > > {noformat} > /** > * Style to resolve dates and times strictly. > * > * Using strict resolution will ensure that all parsed values are within > * the outer range of valid values for the field. Individual fields may > * be further processed for strictness. > * > * For example, resolving year-month and day-of-month in the ISO calendar > * system using strict mode will ensure that the day-of-month is valid > * for the year-month, rejecting invalid values. > */ > STRICT, > /** > * Style to resolve dates and times in a smart, or intelligent, manner. > * > * Using smart resolution will perform the sensible default for each > * field, which may be the same as strict, the same as lenient, or a third > * behavior. Individual fields will interpret this differently. > * > * For example, resolving year-month and day-of-month in the ISO calendar > * system using smart mode will ensure that the day-of-month is from > * 1 to 31, converting any value beyond the last valid day-of-month to be > * the last valid day-of-month. > */ > SMART,{noformat} > > > Therefore, we should set the resolverStyle to STRICT to reject invalid date > values. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27772) Hive UNIX_TIMESTAMP() not returning null for invalid dates
[ https://issues.apache.org/jira/browse/HIVE-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa updated HIVE-27772: --- Description: For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is giving out the timestamp value as the last valid date, rather than NULL. (e.g. UNIX_TIMESTAMP('2001-02-31', '-MM-dd') gives 983354400, which converts to '2001-02-28'. However, for calendar days larger than 31, e.g. 2001-02-32, or 2023-04-32, UNIX_TIMESTAMP() would give NULL as a result. In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or 0). {noformat} 6: jdbc:hive2://localhost:10001/> select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetimetable; INFO : Compiling command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetimetable INFO : No Stats for default@datetimetable, Columns: month, datetimestamp INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, type:string, comment:null), FieldSchema(name:datetimestamp, type:string, comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)], properties:null) INFO : Completed compiling command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time taken: 0.102 seconds INFO : Operation QUERY obtained 0 locks INFO : Executing command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetimetable INFO : Completed executing command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time taken: 0.0 seconds +++---+ | month | datetimestamp | timestampcol | +++---+ | Feb | 2001-02-28 | 983318400 | | Feb | 2001-02-29 | 983318400 | | Feb | 2001-02-30 | 983318400 | | Feb | 2001-02-31 | 983318400 | | Feb | 2001-02-32 | NULL | +++---+ 5 rows selected (0.131 seconds){noformat} It looks like [InstantDateTimeFormatter.java#L52|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/InstantDateTimeFormatter.java#L52] by default, the formatter has the SMART resolver style. According to java jdk : https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/java/time/format/ResolverStyle.java#L103 {noformat} /** * Style to resolve dates and times strictly. * * Using strict resolution will ensure that all parsed values are within * the outer range of valid values for the field. Individual fields may * be further processed for strictness. * * For example, resolving year-month and day-of-month in the ISO calendar * system using strict mode will ensure that the day-of-month is valid * for the year-month, rejecting invalid values. */ STRICT, /** * Style to resolve dates and times in a smart, or intelligent, manner. * * Using smart resolution will perform the sensible default for each * field, which may be the same as strict, the same as lenient, or a third * behavior. Individual fields will interpret this differently. * * For example, resolving year-month and day-of-month in the ISO calendar * system using smart mode will ensure that the day-of-month is from * 1 to 31, converting any value beyond the last valid day-of-month to be * the last valid day-of-month. */ SMART,{noformat} Therefore, we should set the resolverStyle to STRICT to reject invalid date values. was: For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is giving out the timestamp value as the last valid date, rather than NULL. (e.g. UNIX_TIMESTAMP('2001-02-31', '-MM-dd') gives 983354400, which converts to '2001-02-28'. However, for calendar days larger than 31, e.g. 2001-02-32, or 2023-04-32, UNIX_TIMESTAMP() would give NULL as a result. In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or 0). {noformat} 6: jdbc:hive2://localhost:10001/> select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetimetable; INFO : Compiling command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetimetable INFO : No Stats for default@datetimetable, Columns: month, datetimestamp INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, type:string, commen
[jira] [Updated] (HIVE-27772) Hive UNIX_TIMESTAMP() not returning null for invalid dates
[ https://issues.apache.org/jira/browse/HIVE-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa updated HIVE-27772: --- Description: For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is giving out the timestamp value as the last valid date, rather than NULL. (e.g. UNIX_TIMESTAMP('2001-02-31', '-MM-dd') gives 983354400, which converts to '2001-02-28'. However, for calendar days larger than 31, e.g. 2001-02-32, or 2023-04-32, UNIX_TIMESTAMP() would give NULL as a result. In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or 0). {noformat} 6: jdbc:hive2://localhost:10001/> select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetimetable; INFO : Compiling command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetimetable INFO : No Stats for default@datetimetable, Columns: month, datetimestamp INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, type:string, comment:null), FieldSchema(name:datetimestamp, type:string, comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)], properties:null) INFO : Completed compiling command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time taken: 0.102 seconds INFO : Operation QUERY obtained 0 locks INFO : Executing command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetimetable INFO : Completed executing command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time taken: 0.0 seconds +++---+ | month | datetimestamp | timestampcol | +++---+ | Feb | 2001-02-28 | 983318400 | | Feb | 2001-02-29 | 983318400 | | Feb | 2001-02-30 | 983318400 | | Feb | 2001-02-31 | 983318400 | | Feb | 2001-02-32 | NULL | +++---+ 5 rows selected (0.131 seconds){noformat} was: {noformat} 6: jdbc:hive2://localhost:10001/> select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetimetable; INFO : Compiling command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetimetable INFO : No Stats for default@datetimetable, Columns: month, datetimestamp INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, type:string, comment:null), FieldSchema(name:datetimestamp, type:string, comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)], properties:null) INFO : Completed compiling command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time taken: 0.102 seconds INFO : Operation QUERY obtained 0 locks INFO : Executing command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetimetable INFO : Completed executing command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time taken: 0.0 seconds +++---+ | month | datetimestamp | timestampcol | +++---+ | Feb | 2001-02-28 | 983318400 | | Feb | 2001-02-29 | 983318400 | | Feb | 2001-02-30 | 983318400 | | Feb | 2001-02-31 | 983318400 | | Feb | 2001-02-32 | NULL | +++---+ 5 rows selected (0.131 seconds){noformat} > Hive UNIX_TIMESTAMP() not returning null for invalid dates > -- > > Key: HIVE-27772 > URL: https://issues.apache.org/jira/browse/HIVE-27772 > Project: Hive > Issue Type: Bug >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > > For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is > giving out the timestamp value as the last valid date, rather than NULL. > (e.g. UNIX_TIMESTAMP('2001-02-31', '-MM-dd') gives 983354400, which > converts to '2001-02-28'. However, for calendar days larger than 31, e.g. > 2001-02-32, or 2023-04-32, UNIX_TIMESTAMP() would give NULL as a result. > In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or > 0). > > > > {noformat} > 6: jdbc:hive2://localhost:10001/> select month, datetimestamp,
[jira] [Updated] (HIVE-27772) Hive UNIX_TIMESTAMP() not returning null for invalid dates
[ https://issues.apache.org/jira/browse/HIVE-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa updated HIVE-27772: --- Description: {noformat} 6: jdbc:hive2://localhost:10001/> select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetimetable; INFO : Compiling command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetimetable INFO : No Stats for default@datetimetable, Columns: month, datetimestamp INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, type:string, comment:null), FieldSchema(name:datetimestamp, type:string, comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)], properties:null) INFO : Completed compiling command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time taken: 0.102 seconds INFO : Operation QUERY obtained 0 locks INFO : Executing command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from datetimetable INFO : Completed executing command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time taken: 0.0 seconds +++---+ | month | datetimestamp | timestampcol | +++---+ | Feb | 2001-02-28 | 983318400 | | Feb | 2001-02-29 | 983318400 | | Feb | 2001-02-30 | 983318400 | | Feb | 2001-02-31 | 983318400 | | Feb | 2001-02-32 | NULL | +++---+ 5 rows selected (0.131 seconds){noformat} > Hive UNIX_TIMESTAMP() not returning null for invalid dates > -- > > Key: HIVE-27772 > URL: https://issues.apache.org/jira/browse/HIVE-27772 > Project: Hive > Issue Type: Bug >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > > {noformat} > 6: jdbc:hive2://localhost:10001/> select month, datetimestamp, > unix_timestamp(datetimestamp, '-MM-dd') as timestampCol from > datetimetable; > INFO : Compiling > command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): > select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as > timestampCol from datetimetable > INFO : No Stats for default@datetimetable, Columns: month, datetimestamp > INFO : Semantic Analysis Completed (retrial = false) > INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, > type:string, comment:null), FieldSchema(name:datetimestamp, type:string, > comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)], > properties:null) > INFO : Completed compiling > command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); > Time taken: 0.102 seconds > INFO : Operation QUERY obtained 0 locks > INFO : Executing > command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): > select month, datetimestamp, unix_timestamp(datetimestamp, '-MM-dd') as > timestampCol from datetimetable > INFO : Completed executing > command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); > Time taken: 0.0 seconds > +++---+ > | month | datetimestamp | timestampcol | > +++---+ > | Feb | 2001-02-28 | 983318400 | > | Feb | 2001-02-29 | 983318400 | > | Feb | 2001-02-30 | 983318400 | > | Feb | 2001-02-31 | 983318400 | > | Feb | 2001-02-32 | NULL | > +++---+ > 5 rows selected (0.131 seconds){noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27772) Hive UNIX_TIMESTAMP() not returning null for invalid dates
Simhadri Govindappa created HIVE-27772: -- Summary: Hive UNIX_TIMESTAMP() not returning null for invalid dates Key: HIVE-27772 URL: https://issues.apache.org/jira/browse/HIVE-27772 Project: Hive Issue Type: Bug Reporter: Simhadri Govindappa Assignee: Simhadri Govindappa -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27754) Query Filter with OR condition updates every record in the table
[ https://issues.apache.org/jira/browse/HIVE-27754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771151#comment-17771151 ] Simhadri Govindappa commented on HIVE-27754: {quote} {noformat} set hive.cbo.fallback.strategy=NEVER; {noformat} Can be used to prevent running these statements. see also: [https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L687-L688] {quote} Here is a qfile which can repro the issue even with 'hive.cbo.fallback.strategy=NEVER;' [https://github.com/simhadri-g/hive/commit/9520fff464c9d1bf400e5e8f43b5f00bf9615825] {quote}If the expression in the where clause has logical operators ({{{}OR{}}}, {{{}AND{}}}, ...) the operands are implicitly casted to boolean [https://github.com/apache/hive/blob/85f6162becb8723ff6c9f85875048ced6ca7ae89/ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java#L842-L847] {quote} Will debug through this. Thanks! > Query Filter with OR condition updates every record in the table > > > Key: HIVE-27754 > URL: https://issues.apache.org/jira/browse/HIVE-27754 > Project: Hive > Issue Type: Bug >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > > > {noformat} > UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 'Taylor' > ;{noformat} > After the above statement, all the records are updated. The condition > {{'Taylor'}} is a constant string, and it will always evaluate to true > because it's a non-empty string. So, effectively, {{UPDATE}} statement is > updating all rows in the {{customers_man.}} > {{}} > {{Repro: }} > {noformat} > create table customers_man (customer_id bigint, first_name string) > PARTITIONED BY (last_name string) STORED AS orc TBLPROPERTIES > ('transactional'='true'); > insert into customers_man values(1, "Joanna", "Pierce"),(1, "Sharon", > "Taylor"), (2, "Joanna", "Silver"), (2, "Bob", "Silver"), (2, "Susan", > "Morrison") ,(2, "Jake", "Donnel") , (3, "Blake", "Burr"), (3, "Trudy", > "Johnson"), (3, "Trudy", "Henderson"); > select * from customers_man; > > ++---+--+ > | customers_man.customer_id | customers_man.first_name | > customers_man.last_name | > > ++---+--+ > | 3 | Blake | Burr >| > | 2 | Jake | Donnel >| > | 3 | Trudy | Henderson >| > | 3 | Trudy | Johnson >| > | 2 | Susan | Morrison >| > | 1 | Joanna| Pierce >| > | 2 | Joanna| Silver >| > | 2 | Bob | Silver >| > | 1 | Sharon| Taylor >| > > ++---+--+ > UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR > last_name='Taylor' ; > select * from customers_man; > > ++---+--+ > | customers_man.customer_id | customers_man.first_name | > customers_man.last_name | > > ++---+--+ > | 3 | Blake | Burr >| > | 2 | Jake | Donnel >| > | 3 | Trudy | Henderson >| > | 3 | Trudy | Johnson >| > | 2 | Susan | Morrison >| > | 22 | Joanna| Pierce >| > | 2 | Joanna| Silver >| > | 2 | Bob | Silver >| > | 22 | Sharon| Taylor >| > > ++---+--+ > UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR > 'Taylor' ; > se
[jira] [Assigned] (HIVE-27754) Query Filter with OR condition updates every record in the table
[ https://issues.apache.org/jira/browse/HIVE-27754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa reassigned HIVE-27754: -- Assignee: Simhadri Govindappa > Query Filter with OR condition updates every record in the table > > > Key: HIVE-27754 > URL: https://issues.apache.org/jira/browse/HIVE-27754 > Project: Hive > Issue Type: Bug >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > > > {noformat} > UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 'Taylor' > ;{noformat} > After the above statement, all the records are updated. The condition > {{'Taylor'}} is a constant string, and it will always evaluate to true > because it's a non-empty string. So, effectively, {{UPDATE}} statement is > updating all rows in the {{customers_man.}} > {{}} > {{Repro: }} > {noformat} > create table customers_man (customer_id bigint, first_name string) > PARTITIONED BY (last_name string) STORED AS orc TBLPROPERTIES > ('transactional'='true'); > insert into customers_man values(1, "Joanna", "Pierce"),(1, "Sharon", > "Taylor"), (2, "Joanna", "Silver"), (2, "Bob", "Silver"), (2, "Susan", > "Morrison") ,(2, "Jake", "Donnel") , (3, "Blake", "Burr"), (3, "Trudy", > "Johnson"), (3, "Trudy", "Henderson"); > select * from customers_man; > > ++---+--+ > | customers_man.customer_id | customers_man.first_name | > customers_man.last_name | > > ++---+--+ > | 3 | Blake | Burr >| > | 2 | Jake | Donnel >| > | 3 | Trudy | Henderson >| > | 3 | Trudy | Johnson >| > | 2 | Susan | Morrison >| > | 1 | Joanna| Pierce >| > | 2 | Joanna| Silver >| > | 2 | Bob | Silver >| > | 1 | Sharon| Taylor >| > > ++---+--+ > UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR > last_name='Taylor' ; > select * from customers_man; > > ++---+--+ > | customers_man.customer_id | customers_man.first_name | > customers_man.last_name | > > ++---+--+ > | 3 | Blake | Burr >| > | 2 | Jake | Donnel >| > | 3 | Trudy | Henderson >| > | 3 | Trudy | Johnson >| > | 2 | Susan | Morrison >| > | 22 | Joanna| Pierce >| > | 2 | Joanna| Silver >| > | 2 | Bob | Silver >| > | 22 | Sharon| Taylor >| > > ++---+--+ > UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR > 'Taylor' ; > select * from customers_man; > > ++---+--+ > | customers_man.customer_id | customers_man.first_name | > customers_man.last_name | > > ++---+--+ > | 22 | Blake | Burr > | > | 22 | Jake | Donnel > | > | 22 | Trudy | Henderson > | > | 22 | Trudy | Johnson > | > | 22 | Susan | Morrison > | > | 22 | Joanna| Pierce
[jira] [Created] (HIVE-27754) Query Filter with OR condition updates every record in the table
Simhadri Govindappa created HIVE-27754: -- Summary: Query Filter with OR condition updates every record in the table Key: HIVE-27754 URL: https://issues.apache.org/jira/browse/HIVE-27754 Project: Hive Issue Type: Bug Reporter: Simhadri Govindappa {noformat} UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 'Taylor' ;{noformat} After the above statement, all the records are updated. The condition {{'Taylor'}} is a constant string, and it will always evaluate to true because it's a non-empty string. So, effectively, {{UPDATE}} statement is updating all rows in the {{customers_man.}} {{}} {{Repro: }} {noformat} create table customers_man (customer_id bigint, first_name string) PARTITIONED BY (last_name string) STORED AS orc TBLPROPERTIES ('transactional'='true'); insert into customers_man values(1, "Joanna", "Pierce"),(1, "Sharon", "Taylor"), (2, "Joanna", "Silver"), (2, "Bob", "Silver"), (2, "Susan", "Morrison") ,(2, "Jake", "Donnel") , (3, "Blake", "Burr"), (3, "Trudy", "Johnson"), (3, "Trudy", "Henderson"); select * from customers_man; ++---+--+ | customers_man.customer_id | customers_man.first_name | customers_man.last_name | ++---+--+ | 3 | Blake | Burr | | 2 | Jake | Donnel | | 3 | Trudy | Henderson | | 3 | Trudy | Johnson | | 2 | Susan | Morrison | | 1 | Joanna| Pierce | | 2 | Joanna| Silver | | 2 | Bob | Silver | | 1 | Sharon| Taylor | ++---+--+ UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR last_name='Taylor' ; select * from customers_man; ++---+--+ | customers_man.customer_id | customers_man.first_name | customers_man.last_name | ++---+--+ | 3 | Blake | Burr | | 2 | Jake | Donnel | | 3 | Trudy | Henderson | | 3 | Trudy | Johnson | | 2 | Susan | Morrison | | 22 | Joanna| Pierce | | 2 | Joanna| Silver | | 2 | Bob | Silver | | 22 | Sharon| Taylor | ++---+--+ UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 'Taylor' ; select * from customers_man; ++---+--+ | customers_man.customer_id | customers_man.first_name | customers_man.last_name | ++---+--+ | 22 | Blake | Burr | | 22 | Jake | Donnel | | 22 | Trudy | Henderson | | 22 | Trudy | Johnson | | 22 | Susan | Morrison | | 22 | Joanna| Pierce | | 22 | Joanna| Silver | | 22 | Bob | Silver | | 22 | Sharon| Taylor | ++---+--+ --- simpler repro UPDATE customers_man SET customer_id=23 WHERE true; select * from customers_man; +
[jira] [Commented] (HIVE-27646) Iceberg: Retry query when concurrent write queries fail due to conflicting writes
[ https://issues.apache.org/jira/browse/HIVE-27646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767972#comment-17767972 ] Simhadri Govindappa commented on HIVE-27646: updated the fix version to 4.0.0 > Iceberg: Retry query when concurrent write queries fail due to conflicting > writes > - > > Key: HIVE-27646 > URL: https://issues.apache.org/jira/browse/HIVE-27646 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Assume two concurrent update queries- Query A and Query B , that have > overlapping updates. > If Query A commits the data and delete files first, then Query B will fail > with validation failure due to conflicting writes. > In this case, Query B should invalidate the commit files that are already > generated and re-execute the full query on the latest snapshot. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27646) Iceberg: Retry query when concurrent write queries fail due to conflicting writes
[ https://issues.apache.org/jira/browse/HIVE-27646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa updated HIVE-27646: --- Fix Version/s: 4.0.0 > Iceberg: Retry query when concurrent write queries fail due to conflicting > writes > - > > Key: HIVE-27646 > URL: https://issues.apache.org/jira/browse/HIVE-27646 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Assume two concurrent update queries- Query A and Query B , that have > overlapping updates. > If Query A commits the data and delete files first, then Query B will fail > with validation failure due to conflicting writes. > In this case, Query B should invalidate the commit files that are already > generated and re-execute the full query on the latest snapshot. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27646) Iceberg: Retry query when concurrent write queries fail due to conflicting writes
[ https://issues.apache.org/jira/browse/HIVE-27646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa resolved HIVE-27646. Resolution: Fixed Change merged to master. Thanks [~dkuzmenko] and [@suenalaba |https://github.com/suenalaba] for the review! > Iceberg: Retry query when concurrent write queries fail due to conflicting > writes > - > > Key: HIVE-27646 > URL: https://issues.apache.org/jira/browse/HIVE-27646 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > > Assume two concurrent update queries- Query A and Query B , that have > overlapping updates. > If Query A commits the data and delete files first, then Query B will fail > with validation failure due to conflicting writes. > In this case, Query B should invalidate the commit files that are already > generated and re-execute the full query on the latest snapshot. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27656) Upgrade jansi.version to 2.4.0
[ https://issues.apache.org/jira/browse/HIVE-27656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa resolved HIVE-27656. Resolution: Fixed > Upgrade jansi.version to 2.4.0 > --- > > Key: HIVE-27656 > URL: https://issues.apache.org/jira/browse/HIVE-27656 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > > [https://github.com/fusesource/jansi/blob/master/changelog.md] > Arm64/aarch64 support is added in jansi version 2.4.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27656) Upgrade jansi.version to 2.4.0
[ https://issues.apache.org/jira/browse/HIVE-27656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765132#comment-17765132 ] Simhadri Govindappa commented on HIVE-27656: Change has been merged to master. Thanks, [~zabetak] , [~lvegh] and [~ayushtkn]! > Upgrade jansi.version to 2.4.0 > --- > > Key: HIVE-27656 > URL: https://issues.apache.org/jira/browse/HIVE-27656 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > > [https://github.com/fusesource/jansi/blob/master/changelog.md] > Arm64/aarch64 support is added in jansi version 2.4.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27567) Support building multi-platform images
[ https://issues.apache.org/jira/browse/HIVE-27567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa resolved HIVE-27567. Resolution: Fixed > Support building multi-platform images > -- > > Key: HIVE-27567 > URL: https://issues.apache.org/jira/browse/HIVE-27567 > Project: Hive > Issue Type: Sub-task >Reporter: Zhihua Deng >Assignee: Simhadri Govindappa >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27567) Support building multi-platform images
[ https://issues.apache.org/jira/browse/HIVE-27567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764104#comment-17764104 ] Simhadri Govindappa commented on HIVE-27567: Fixed in HIVE-27277 , from hive 4.00-beta-1 hive docker image supports both arm64 and amd64 platforms. https://hub.docker.com/r/apache/hive/tags > Support building multi-platform images > -- > > Key: HIVE-27567 > URL: https://issues.apache.org/jira/browse/HIVE-27567 > Project: Hive > Issue Type: Sub-task >Reporter: Zhihua Deng >Assignee: Simhadri Govindappa >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27656) Upgrade jansi.version to 2.4.0
Simhadri Govindappa created HIVE-27656: -- Summary: Upgrade jansi.version to 2.4.0 Key: HIVE-27656 URL: https://issues.apache.org/jira/browse/HIVE-27656 Project: Hive Issue Type: Improvement Reporter: Simhadri Govindappa Assignee: Simhadri Govindappa [https://github.com/fusesource/jansi/blob/master/changelog.md] Arm64/aarch64 support is added in jansi version 2.4.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27265) Ensure table properties are case-insensitive when translating hms property to iceberg property
[ https://issues.apache.org/jira/browse/HIVE-27265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa resolved HIVE-27265. Resolution: Won't Fix Table properties are case-sensitive. > Ensure table properties are case-insensitive when translating hms property to > iceberg property > -- > > Key: HIVE-27265 > URL: https://issues.apache.org/jira/browse/HIVE-27265 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > > > In this example, the "format-version" case is modified to upper case and the > query fails. > > {noformat} > >>>CREATE EXTERNAL TABLE TBL5(ID INT, NAME STRING) PARTITIONED BY (DEPT > >>>STRING) STORED BY ICEBERG STORED AS PARQUET TBLPROPERTIES > >>>('format-version'='2'); > OK > >>>insert into tbl5 values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), > >>>(4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56); > OK > >>>delete from tbl5 where name in ('one', 'four') or id = 22; > OK{noformat} > > {noformat} > >>> CREATE EXTERNAL TABLE TBL6(ID INT, NAME STRING) PARTITIONED BY (DEPT > >>> STRING) STORED BY ICEBERG STORED AS PARQUET TBLPROPERTIES > >>> ('FORMAT-VERSION'='2'); > ok > >>>insert into tbl6 values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), > >>>(4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56); > ok > >>>delete from tbl6 where name in ('one', 'four') or id = 22; > Error: Error while compiling statement: FAILED: SemanticException [Error > 10297]: Attempt to do update or delete on table tbl6 that is not > transactional (state=42000,code=10297){noformat} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27653) Iceberg: Add conflictDetectionFilter to validate concurrently added data and delete files
Simhadri Govindappa created HIVE-27653: -- Summary: Iceberg: Add conflictDetectionFilter to validate concurrently added data and delete files Key: HIVE-27653 URL: https://issues.apache.org/jira/browse/HIVE-27653 Project: Hive Issue Type: Improvement Reporter: Simhadri Govindappa Assignee: Simhadri Govindappa -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27277) Set up github actions workflow to build and push docker image to docker hub
[ https://issues.apache.org/jira/browse/HIVE-27277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759033#comment-17759033 ] Simhadri Govindappa commented on HIVE-27277: Thanks [~ayushtkn] . {noformat} tags: ${{ secrets.DOCKERHUB_USER }}/hive:${{ env.tag }}{noformat} This occurred because secrets.DOCKERHUB_USER should ideally be the repo name which is "apache" but it turns out its "afsjenkins" > Set up github actions workflow to build and push docker image to docker hub > --- > > Key: HIVE-27277 > URL: https://issues.apache.org/jira/browse/HIVE-27277 > Project: Hive > Issue Type: Sub-task >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 4h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27646) Iceberg: Retry query when concurrent write queries fail due to conflicting writes
[ https://issues.apache.org/jira/browse/HIVE-27646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa updated HIVE-27646: --- Summary: Iceberg: Retry query when concurrent write queries fail due to conflicting writes (was: Iceberg: Retry query when concurrent write queries fail due to conflicting write) > Iceberg: Retry query when concurrent write queries fail due to conflicting > writes > - > > Key: HIVE-27646 > URL: https://issues.apache.org/jira/browse/HIVE-27646 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > > Assume two concurrent update queries- Query A and Query B , that have > overlapping updates. > If Query A commits the data and delete files first, then Query B will fail > with validation failure due to conflicting writes. > In this case, Query B should invalidate the commit files that are already > generated and re-execute the full query on the latest snapshot. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27646) Iceberg: Retry query when concurrent write queries fail due to conflicting write
[ https://issues.apache.org/jira/browse/HIVE-27646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa updated HIVE-27646: --- Description: Assume two concurrent update queries- Query A and Query B , that have overlapping updates. If Query A commits the data and delete files first, then Query B will fail with validation failure due to conflicting writes. In this case, Query B should invalidate the commit files that are already generated and re-execute the full query on the latest snapshot. was: During concurrent updates, Assume 2 concurrent update queries- Query A and Query B that have insersecting updates If Query A commits the data and delet If any conflicting files are detected during the commit stage of the query that commits last, we will have to re-execute the full query. > Iceberg: Retry query when concurrent write queries fail due to conflicting > write > > > Key: HIVE-27646 > URL: https://issues.apache.org/jira/browse/HIVE-27646 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > > Assume two concurrent update queries- Query A and Query B , that have > overlapping updates. > If Query A commits the data and delete files first, then Query B will fail > with validation failure due to conflicting writes. > In this case, Query B should invalidate the commit files that are already > generated and re-execute the full query on the latest snapshot. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27646) Iceberg: Retry query when concurrent write queries fail due to conflicting write
[ https://issues.apache.org/jira/browse/HIVE-27646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa updated HIVE-27646: --- Description: During concurrent updates, Assume 2 concurrent update queries- Query A and Query B that have insersecting updates If Query A commits the data and delet If any conflicting files are detected during the commit stage of the query that commits last, we will have to re-execute the full query. was: During concurrent updates, Assume 2 concurrent update quries- Query A If any conflicting files are detected during the commit stage of the query that commits last, we will have to re-execute the full query. > Iceberg: Retry query when concurrent write queries fail due to conflicting > write > > > Key: HIVE-27646 > URL: https://issues.apache.org/jira/browse/HIVE-27646 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > > During concurrent updates, > Assume 2 concurrent update queries- Query A and Query B that have > insersecting updates > If Query A commits the data and delet > If any conflicting files are detected during the commit stage of the query > that commits last, we will have to re-execute the full query. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27646) Iceberg: Retry query when concurrent write queries fail due to conflicting write
[ https://issues.apache.org/jira/browse/HIVE-27646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa updated HIVE-27646: --- Description: During concurrent updates, Assume 2 concurrent update quries- Query A If any conflicting files are detected during the commit stage of the query that commits last, we will have to re-execute the full query. was: During concurrent updates, If any conflicting files are detected during the commit stage of the query that commits last , we will have to re-execuete the full query. > Iceberg: Retry query when concurrent write queries fail due to conflicting > write > > > Key: HIVE-27646 > URL: https://issues.apache.org/jira/browse/HIVE-27646 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > > During concurrent updates, > Assume 2 concurrent update quries- Query A > If any conflicting files are detected during the commit stage of the query > that commits last, we will have to re-execute the full query. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27646) Iceberg: Retry query when concurrent write queries fail due to conflicting write
[ https://issues.apache.org/jira/browse/HIVE-27646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa updated HIVE-27646: --- Description: During concurrent updates, If any conflicting files are detected during the commit stage of the query that commits last , we will have to re-execuete the full query. > Iceberg: Retry query when concurrent write queries fail due to conflicting > write > > > Key: HIVE-27646 > URL: https://issues.apache.org/jira/browse/HIVE-27646 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > > During concurrent updates, > If any conflicting files are detected during the commit stage of the query > that commits last , we will have to re-execuete the full query. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27646) Iceberg: Retry query when concurrent write queries fail due to conflicting write
[ https://issues.apache.org/jira/browse/HIVE-27646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa updated HIVE-27646: --- Summary: Iceberg: Retry query when concurrent write queries fail due to conflicting write (was: Iceberg: Re-execute query when concurrent writes fail due to conflicting write) > Iceberg: Retry query when concurrent write queries fail due to conflicting > write > > > Key: HIVE-27646 > URL: https://issues.apache.org/jira/browse/HIVE-27646 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27646) Iceberg: Re-execute query when concurrent writes fail due to conflicting write
Simhadri Govindappa created HIVE-27646: -- Summary: Iceberg: Re-execute query when concurrent writes fail due to conflicting write Key: HIVE-27646 URL: https://issues.apache.org/jira/browse/HIVE-27646 Project: Hive Issue Type: Improvement Reporter: Simhadri Govindappa Assignee: Simhadri Govindappa -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27589) Iceberg: Branches of Merge/Update statements should be committed atomically
[ https://issues.apache.org/jira/browse/HIVE-27589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17756936#comment-17756936 ] Simhadri Govindappa commented on HIVE-27589: Thanks [~dkuzmenko] , [~krisztiankasa] and [~zhangbutao] :) > Iceberg: Branches of Merge/Update statements should be committed atomically > --- > > Key: HIVE-27589 > URL: https://issues.apache.org/jira/browse/HIVE-27589 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27589) Iceberg: Branches of Merge/Update statements should be committed atomically
[ https://issues.apache.org/jira/browse/HIVE-27589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa reassigned HIVE-27589: -- Assignee: Simhadri Govindappa > Iceberg: Branches of Merge/Update statements should be committed atomically > --- > > Key: HIVE-27589 > URL: https://issues.apache.org/jira/browse/HIVE-27589 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Assignee: Simhadri Govindappa >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27528) Hive iceberg: Alter table command should not call the metastore to update column stats
[ https://issues.apache.org/jira/browse/HIVE-27528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa updated HIVE-27528: --- Description: l. The bit vector that contains the ndv values is being overwritten by updatecolstats which is called during alter table command. was:It overwrites the previously calculated bit vectors for ndv as well. > Hive iceberg: Alter table command should not call the metastore to update > column stats > --- > > Key: HIVE-27528 > URL: https://issues.apache.org/jira/browse/HIVE-27528 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > > l. > The bit vector that contains the ndv values is being overwritten by > updatecolstats which is called during alter table command. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27528) Hive iceberg: Alter table command should not call the metastore to update column stats
[ https://issues.apache.org/jira/browse/HIVE-27528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa updated HIVE-27528: --- Description: The bit vector that contains the ndv values is being overwritten by updatecolstats which is called during alter table command. was: l. The bit vector that contains the ndv values is being overwritten by updatecolstats which is called during alter table command. > Hive iceberg: Alter table command should not call the metastore to update > column stats > --- > > Key: HIVE-27528 > URL: https://issues.apache.org/jira/browse/HIVE-27528 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > > > > The bit vector that contains the ndv values is being overwritten by > updatecolstats which is called during alter table command. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27528) Hive iceberg: Alter table command should not call the metastore to update column stats
[ https://issues.apache.org/jira/browse/HIVE-27528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa updated HIVE-27528: --- Description: It overwrites the previously calculated bit vectors for ndv as well. > Hive iceberg: Alter table command should not call the metastore to update > column stats > --- > > Key: HIVE-27528 > URL: https://issues.apache.org/jira/browse/HIVE-27528 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > > It overwrites the previously calculated bit vectors for ndv as well. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27528) Hive iceberg: Alter table command should not call the metastore to update column stats
Simhadri Govindappa created HIVE-27528: -- Summary: Hive iceberg: Alter table command should not call the metastore to update column stats Key: HIVE-27528 URL: https://issues.apache.org/jira/browse/HIVE-27528 Project: Hive Issue Type: Improvement Reporter: Simhadri Govindappa Assignee: Simhadri Govindappa -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27514) Patched-iceberg-core pom version contains an expression but should be a constant.
Simhadri Govindappa created HIVE-27514: -- Summary: Patched-iceberg-core pom version contains an expression but should be a constant. Key: HIVE-27514 URL: https://issues.apache.org/jira/browse/HIVE-27514 Project: Hive Issue Type: Improvement Reporter: Simhadri Govindappa Assignee: Simhadri Govindappa When building iceberg module maven throws the following warning : {noformat} [WARNING] [WARNING] Some problems were encountered while building the effective model for org.apache.hive:patched-iceberg-api:jar:patched-1.3.0-4.0.0-beta-1-SNAPSHOT [WARNING] 'version' contains an expression but should be a constant. @ org.apache.hive:patched-iceberg-api:patched-${iceberg.version}-${project.parent.version}, /Users/simhadri.govindappa/Documents/apache/hive/iceberg/patched-iceberg-api/pom.xml, line 12, column 12 [WARNING] [WARNING] Some problems were encountered while building the effective model for org.apache.hive:patched-iceberg-core:jar:patched-1.3.0-4.0.0-beta-1-SNAPSHOT [WARNING] 'version' contains an expression but should be a constant. @ org.apache.hive:patched-iceberg-core:patched-${iceberg.version}-${project.parent.version}, /Users/simhadri.govindappa/Documents/apache/hive/iceberg/patched-iceberg-core/pom.xml, line 12, column 12 [WARNING] [WARNING] It is highly recommended to fix these problems because they threaten the stability of your build. [WARNING] [WARNING] For this reason, future Maven versions might no longer support building such malformed projects. [WARNING] {noformat} Future Maven versions might no longer support building such malformed projects. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27448) Hive Iceberg: Merge column stats
Simhadri Govindappa created HIVE-27448: -- Summary: Hive Iceberg: Merge column stats Key: HIVE-27448 URL: https://issues.apache.org/jira/browse/HIVE-27448 Project: Hive Issue Type: Improvement Reporter: Simhadri Govindappa Assignee: Simhadri Govindappa -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27433) Hive-site: Add redirect to blogs
Simhadri Govindappa created HIVE-27433: -- Summary: Hive-site: Add redirect to blogs Key: HIVE-27433 URL: https://issues.apache.org/jira/browse/HIVE-27433 Project: Hive Issue Type: Improvement Components: Website Reporter: Simhadri Govindappa Assignee: Simhadri Govindappa -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27356) Hive should write name of blob type instead of table name in Puffin
[ https://issues.apache.org/jira/browse/HIVE-27356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa reassigned HIVE-27356: -- Assignee: Simhadri Govindappa > Hive should write name of blob type instead of table name in Puffin > --- > > Key: HIVE-27356 > URL: https://issues.apache.org/jira/browse/HIVE-27356 > Project: Hive > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Simhadri Govindappa >Priority: Major > > Currently Hive writes the name of the table plus snapshot id as blob type: > [https://github.com/apache/hive/blob/aa1e067033ef0b5468f725cfd3776810800af96d/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L422] > Instead, it should write the name of the blog it writes. Table name and > snapshot id are redundant information anyway, as they can be inferred from > the location and filename of the puffin file. > Currently it writes a non-standard blob (Standard blob types are listed > [here|https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/puffin/StandardBlobTypes.java]). > I think it would be better to write standard blobs for interoperability. But > if Hive wants to write non-standard blobs anyway, it should still come up > with a descriptive name for them, e.g. 'hive-column-statistics-v1'. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-27356) Hive should write name of blob type instead of table name in Puffin
[ https://issues.apache.org/jira/browse/HIVE-27356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17723889#comment-17723889 ] Simhadri Govindappa edited comment on HIVE-27356 at 5/18/23 10:56 AM: -- Sure. {quote} Currently it writes a non-standard blob (Standard blob types are listed [here|https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/puffin/StandardBlobTypes.java]). I think it would be better to write standard blobs for interoperability. But if Hive wants to write non-standard blobs anyway, it should still come up with a descriptive name for them, e.g. 'hive-column-statistics-v1'. {quote} The initial design we went with col stats object . We can easily change this to a different blob type. was (Author: simhadri-g): Sure > Hive should write name of blob type instead of table name in Puffin > --- > > Key: HIVE-27356 > URL: https://issues.apache.org/jira/browse/HIVE-27356 > Project: Hive > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Priority: Major > > Currently Hive writes the name of the table plus snapshot id as blob type: > [https://github.com/apache/hive/blob/aa1e067033ef0b5468f725cfd3776810800af96d/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L422] > Instead, it should write the name of the blog it writes. Table name and > snapshot id are redundant information anyway, as they can be inferred from > the location and filename of the puffin file. > Currently it writes a non-standard blob (Standard blob types are listed > [here|https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/puffin/StandardBlobTypes.java]). > I think it would be better to write standard blobs for interoperability. But > if Hive wants to write non-standard blobs anyway, it should still come up > with a descriptive name for them, e.g. 'hive-column-statistics-v1'. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27356) Hive should write name of blob type instead of table name in Puffin
[ https://issues.apache.org/jira/browse/HIVE-27356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17723889#comment-17723889 ] Simhadri Govindappa commented on HIVE-27356: Sure > Hive should write name of blob type instead of table name in Puffin > --- > > Key: HIVE-27356 > URL: https://issues.apache.org/jira/browse/HIVE-27356 > Project: Hive > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Priority: Major > > Currently Hive writes the name of the table plus snapshot id as blob type: > [https://github.com/apache/hive/blob/aa1e067033ef0b5468f725cfd3776810800af96d/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L422] > Instead, it should write the name of the blog it writes. Table name and > snapshot id are redundant information anyway, as they can be inferred from > the location and filename of the puffin file. > Currently it writes a non-standard blob (Standard blob types are listed > [here|https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/puffin/StandardBlobTypes.java]). > I think it would be better to write standard blobs for interoperability. But > if Hive wants to write non-standard blobs anyway, it should still come up > with a descriptive name for them, e.g. 'hive-column-statistics-v1'. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27329) Document usage of the image
[ https://issues.apache.org/jira/browse/HIVE-27329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722236#comment-17722236 ] Simhadri Govindappa commented on HIVE-27329: Got it updated it in the offical docker hub. [https://hub.docker.com/r/apache/hive] I am also updating Hive website to include this: [https://github.com/apache/hive-site/pull/5] > Document usage of the image > --- > > Key: HIVE-27329 > URL: https://issues.apache.org/jira/browse/HIVE-27329 > Project: Hive > Issue Type: Sub-task >Reporter: Zhihua Deng >Assignee: Simhadri Govindappa >Priority: Major > > After we pushed the image to docker hub, it would be good to update > https://cwiki.apache.org/confluence/display/Hive/GettingStarted for using the > image. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27329) Document usage of the image
[ https://issues.apache.org/jira/browse/HIVE-27329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa reassigned HIVE-27329: -- Assignee: Simhadri Govindappa > Document usage of the image > --- > > Key: HIVE-27329 > URL: https://issues.apache.org/jira/browse/HIVE-27329 > Project: Hive > Issue Type: Sub-task >Reporter: Zhihua Deng >Assignee: Simhadri Govindappa >Priority: Major > > After we pushed the image to docker hub, it would be good to update > https://cwiki.apache.org/confluence/display/Hive/GettingStarted for using the > image. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27339) Hive Website: Add links to the new hive dockerhub
Simhadri Govindappa created HIVE-27339: -- Summary: Hive Website: Add links to the new hive dockerhub Key: HIVE-27339 URL: https://issues.apache.org/jira/browse/HIVE-27339 Project: Hive Issue Type: Improvement Components: Website Reporter: Simhadri Govindappa Assignee: Simhadri Govindappa -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27329) Document usage of the image
[ https://issues.apache.org/jira/browse/HIVE-27329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17721689#comment-17721689 ] Simhadri Govindappa commented on HIVE-27329: Sure > Document usage of the image > --- > > Key: HIVE-27329 > URL: https://issues.apache.org/jira/browse/HIVE-27329 > Project: Hive > Issue Type: Sub-task >Reporter: Zhihua Deng >Priority: Major > > After we pushed the image to docker hub, it would be good to update > https://cwiki.apache.org/confluence/display/Hive/GettingStarted for using the > image. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27327) Iceberg basic stats: Incorrect row count in snapshot summary leading to unoptimized plans
Simhadri Govindappa created HIVE-27327: -- Summary: Iceberg basic stats: Incorrect row count in snapshot summary leading to unoptimized plans Key: HIVE-27327 URL: https://issues.apache.org/jira/browse/HIVE-27327 Project: Hive Issue Type: Bug Reporter: Simhadri Govindappa Assignee: Simhadri Govindappa In the absence of equality deletes, the total row count should be : {noformat} row_count = total-records - total-position-deletes{noformat} Example: After many inserts and deletes, there are only 46 records in a table. {noformat} >>select count(*) from llap_orders; +--+ | _c0 | +--+ | 46 | +--+ 1 row selected (7.22 seconds) {noformat} But the total records in snapshot summary indicate that there are 300 records {noformat} { "sequence-number" : 19, "snapshot-id" : 4237525869561629328, "parent-snapshot-id" : 2572487769557272977, "timestamp-ms" : 1683553017982, "summary" : { "operation" : "append", "added-data-files" : "5", "added-records" : "12", "added-files-size" : "3613", "changed-partition-count" : "5", "total-records" : "300", "total-files-size" : "164405", "total-data-files" : "100", "total-delete-files" : "73", "total-position-deletes" : "254", "total-equality-deletes" : "0" }{noformat} As a result of this , the hive plans generated are unoptimized. {noformat} 0: jdbc:hive2://simhadrigovindappa-2.simhadri> explain update llap_orders set itemid=7 where itemid=5; INFO : OK ++ | Explain | ++ | Vertex dependency in root stage | | Reducer 2 <- Map 1 (SIMPLE_EDGE) | | Reducer 3 <- Map 1 (SIMPLE_EDGE) | | | | Stage-4 | | Stats Work{} | | Stage-0 | | Move Operator | | table:{"name:":"db.llap_orders"} | | Stage-3 | | Dependency Collection{} | | Stage-2 | | Reducer 2 vectorized | | File Output Operator [FS_14] | | table:{"name:":"db.llap_orders"} | | Select Operator [SEL_13] (rows=150 width=424) | | Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"] | | <-Map 1 [SIMPLE_EDGE] | | SHUFFLE [RS_4] | | Select Operator [SEL_3] (rows=150 width=424) | | Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col7","_col8","_col9"] | | Select Operator [SEL_2] (rows=150 width=644) | | Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col7","_col8","_col9","_col10","_col11","_col13","_col14","_col15"] | | Filter Operator [FIL_9] (rows=150 width=220) | | predicate:(itemid = 5) | | TableScan [TS_0] (rows=300 width=220) | | db@llap_orders,llap_orders,Tbl:COMPLETE,Col:COMPLETE,Output:["orderid","quantity","itemid","tradets","p1","p2"] | | Reducer 3 vectorized | | File Output Operator [FS_16] | | table:{"name:":"db.llap_orders"} | | Select Operator [SEL_15] | | Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col4","_col5"] | | <-Map 1 [SIMPLE_EDGE] | | SHUFFLE [RS_10] | | PartitionCols:_col4, _col5 | | Select Operator [SEL_7] (rows=150 width=220) | | Output:["_col0","_col1","_col2","_col3","_col4","_col5"] | | Please refer to the previous Select Operator [SEL_2] | | | ++ 39 rows selected (0.104 seconds) 0: jdbc:hive2://simhadrigovindappa-2.simhadri>{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-23394) TestJdbcGenericUDTFGetSplits2#testGenericUDTFOrderBySplitCount1 is flaky
[ https://issues.apache.org/jira/browse/HIVE-23394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa resolved HIVE-23394. Resolution: Fixed > TestJdbcGenericUDTFGetSplits2#testGenericUDTFOrderBySplitCount1 is flaky > > > Key: HIVE-23394 > URL: https://issues.apache.org/jira/browse/HIVE-23394 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Haindrich >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 4h 40m > Remaining Estimate: 0h > > both > TestJdbcGenericUDTFGetSplits2.testGenericUDTFOrderBySplitCount1 and > TestJdbcGenericUDTFGetSplits.testGenericUDTFOrderBySplitCount1 > can fail with the exception below > seems like the connection was lost > {code} > Error Message > Failed to close statement > Stacktrace > java.sql.SQLException: Failed to close statement > at > org.apache.hive.jdbc.HiveStatement.closeStatementIfNeeded(HiveStatement.java:200) > at > org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:205) > at org.apache.hive.jdbc.HiveStatement.close(HiveStatement.java:222) > at > org.apache.hive.jdbc.AbstractTestJdbcGenericUDTFGetSplits.runQuery(AbstractTestJdbcGenericUDTFGetSplits.java:135) > at > org.apache.hive.jdbc.AbstractTestJdbcGenericUDTFGetSplits.testGenericUDTFOrderBySplitCount1(AbstractTestJdbcGenericUDTFGetSplits.java:164) > at > org.apache.hive.jdbc.TestJdbcGenericUDTFGetSplits2.testGenericUDTFOrderBySplitCount1(TestJdbcGenericUDTFGetSplits2.java:28) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > Caused by: org.apache.thrift.TApplicationException: CloseOperation failed: > out of sequence response > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:84) > at > org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_CloseOperation(TCLIService.java:521) > at > org.apache.hive.service.rpc.thrift.TCLIService$Client.CloseOperation(TCLIService.java:508) > at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1732) > at com.sun.proxy.$Proxy146.CloseOperation(Unknown Source) > at > org.apache.hive.jdbc.HiveStatement.closeStatementIfNeeded(HiveStatement.java:193) > ... 14 more > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-23394) TestJdbcGenericUDTFGetSplits2#testGenericUDTFOrderBySplitCount1 is flaky
[ https://issues.apache.org/jira/browse/HIVE-23394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17720466#comment-17720466 ] Simhadri Govindappa commented on HIVE-23394: The change is merged to master. Thanks, [~dkuzmenko] ,[~ayushtkn] for the review! > TestJdbcGenericUDTFGetSplits2#testGenericUDTFOrderBySplitCount1 is flaky > > > Key: HIVE-23394 > URL: https://issues.apache.org/jira/browse/HIVE-23394 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Haindrich >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 4h 40m > Remaining Estimate: 0h > > both > TestJdbcGenericUDTFGetSplits2.testGenericUDTFOrderBySplitCount1 and > TestJdbcGenericUDTFGetSplits.testGenericUDTFOrderBySplitCount1 > can fail with the exception below > seems like the connection was lost > {code} > Error Message > Failed to close statement > Stacktrace > java.sql.SQLException: Failed to close statement > at > org.apache.hive.jdbc.HiveStatement.closeStatementIfNeeded(HiveStatement.java:200) > at > org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:205) > at org.apache.hive.jdbc.HiveStatement.close(HiveStatement.java:222) > at > org.apache.hive.jdbc.AbstractTestJdbcGenericUDTFGetSplits.runQuery(AbstractTestJdbcGenericUDTFGetSplits.java:135) > at > org.apache.hive.jdbc.AbstractTestJdbcGenericUDTFGetSplits.testGenericUDTFOrderBySplitCount1(AbstractTestJdbcGenericUDTFGetSplits.java:164) > at > org.apache.hive.jdbc.TestJdbcGenericUDTFGetSplits2.testGenericUDTFOrderBySplitCount1(TestJdbcGenericUDTFGetSplits2.java:28) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > Caused by: org.apache.thrift.TApplicationException: CloseOperation failed: > out of sequence response > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:84) > at > org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_CloseOperation(TCLIService.java:521) > at > org.apache.hive.service.rpc.thrift.TCLIService$Client.CloseOperation(TCLIService.java:508) > at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1732) > at com.sun.proxy.$Proxy146.CloseOperation(Unknown Source) > at > org.apache.hive.jdbc.HiveStatement.closeStatementIfNeeded(HiveStatement.java:193) > ... 14 more > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-27277) Set up github actions workflow to build and push docker image to docker hub
[ https://issues.apache.org/jira/browse/HIVE-27277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17716378#comment-17716378 ] Simhadri Govindappa edited comment on HIVE-27277 at 4/25/23 6:04 PM: - INFRA-24505 : Docker Repo created for apache hive: [https://hub.docker.com/r/apache/hive] was (Author: simhadri-g): Docker Repo created for apache hive: [https://hub.docker.com/r/apache/hive] > Set up github actions workflow to build and push docker image to docker hub > --- > > Key: HIVE-27277 > URL: https://issues.apache.org/jira/browse/HIVE-27277 > Project: Hive > Issue Type: Sub-task >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)