[jira] [Work logged] (HIVE-26373) ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp
[ https://issues.apache.org/jira/browse/HIVE-26373?focusedWorklogId=788932=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-788932 ] ASF GitHub Bot logged work on HIVE-26373: - Author: ASF GitHub Bot Created on: 08/Jul/22 10:40 Start Date: 08/Jul/22 10:40 Worklog Time Spent: 10m Work Description: zabetak closed pull request #3418: HIVE-26373: ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp URL: https://github.com/apache/hive/pull/3418 Issue Time Tracking --- Worklog Id: (was: 788932) Time Spent: 1h 10m (was: 1h) > ClassCastException while inserting Avro data into Hbase table for nested > struct with Timestamp > -- > > Key: HIVE-26373 > URL: https://issues.apache.org/jira/browse/HIVE-26373 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > For Avro data where the schema has nested struct with a Timestamp datatype, > we get the following ClassCastException: > {code:java} > 2022-07-05T08:40:51,572 ERROR [LocalJobRunner Map Task Executor #0] > mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.common.type.Timestamp cannot be cast to > org.apache.hadoop.hive.serde2.lazy.LazyPrimitive > at > org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.AbstractPrimitiveLazyObjectInspector.getPrimitiveWritableObject(AbstractPrimitiveLazyObjectInspector.java:40) > at > org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyTimestampObjectInspector.getPrimitiveWritableObject(LazyTimestampObjectInspector.java:29) > at > org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:308) > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292) > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247) > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:231) > at > org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1059) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:128) > at > org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552) > ... 11 more {code} > The problem starts in {{toLazyObject}} method of > {*}AvroLazyObjectInspector.java{*}, when > [this|https://github.com/apache/hive/blob/53009126f6fe7ccf24cf052fd6c156542f38b19d/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroLazyObjectInspector.java#L347] > condition returns false for {*}Timestamp{*}, preventing the conversion of > *Timestamp* to *LazyTimestamp* > [here|https://github.com/apache/hive/blob/53009126f6fe7ccf24cf052fd6c156542f38b19d/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java#L132]. > The solution is to return {{true}} for Timestamps in the {{isPrimitive}} > method. -- This message
[jira] [Work logged] (HIVE-26373) ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp
[ https://issues.apache.org/jira/browse/HIVE-26373?focusedWorklogId=788628=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-788628 ] ASF GitHub Bot logged work on HIVE-26373: - Author: ASF GitHub Bot Created on: 07/Jul/22 13:29 Start Date: 07/Jul/22 13:29 Worklog Time Spent: 10m Work Description: zabetak commented on code in PR #3418: URL: https://github.com/apache/hive/pull/3418#discussion_r915874809 ## hbase-handler/src/test/results/positive/hbase_avro_nested_timestamp.q.out: ## @@ -0,0 +1,45 @@ +PREHOOK: query: CREATE EXTERNAL TABLE tbl( +`key` string COMMENT '', +`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>) +ROW FORMAT SERDE + 'org.apache.hadoop.hive.hbase.HBaseSerDe' +STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' +WITH SERDEPROPERTIES ( +'serialization.format'='1', +'hbase.columns.mapping' = ':key,data:frV4', +'data.frV4.serialization.type'='avro', + A masked pattern was here +) +TBLPROPERTIES ( +'hbase.table.name' = 'HiveAvroTable', +'hbase.struct.autogenerate'='true') +PREHOOK: type: CREATETABLE +PREHOOK: Output: database:default +PREHOOK: Output: default@tbl +POSTHOOK: query: CREATE EXTERNAL TABLE tbl( +`key` string COMMENT '', +`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>) +ROW FORMAT SERDE + 'org.apache.hadoop.hive.hbase.HBaseSerDe' +STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' +WITH SERDEPROPERTIES ( +'serialization.format'='1', +'hbase.columns.mapping' = ':key,data:frV4', +'data.frV4.serialization.type'='avro', + A masked pattern was here +) +TBLPROPERTIES ( +'hbase.table.name' = 'HiveAvroTable', +'hbase.struct.autogenerate'='true') +POSTHOOK: type: CREATETABLE +POSTHOOK: Output: database:default +POSTHOOK: Output: default@tbl +PREHOOK: query: select data_frV4.dischargedate.value from tbl +PREHOOK: type: QUERY +PREHOOK: Input: default@tbl + A masked pattern was here +POSTHOOK: query: select data_frV4.dischargedate.value from tbl +POSTHOOK: type: QUERY +POSTHOOK: Input: default@tbl + A masked pattern was here +1970-01-19 20:16:19.2 Review Comment: The resolution of this is discussed here: https://github.com/apache/hive/pull/3418#issuecomment-1177583388 Issue Time Tracking --- Worklog Id: (was: 788628) Time Spent: 1h (was: 50m) > ClassCastException while inserting Avro data into Hbase table for nested > struct with Timestamp > -- > > Key: HIVE-26373 > URL: https://issues.apache.org/jira/browse/HIVE-26373 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > For Avro data where the schema has nested struct with a Timestamp datatype, > we get the following ClassCastException: > {code:java} > 2022-07-05T08:40:51,572 ERROR [LocalJobRunner Map Task Executor #0] > mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.common.type.Timestamp cannot be cast to > org.apache.hadoop.hive.serde2.lazy.LazyPrimitive > at > org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.AbstractPrimitiveLazyObjectInspector.getPrimitiveWritableObject(AbstractPrimitiveLazyObjectInspector.java:40) > at > org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyTimestampObjectInspector.getPrimitiveWritableObject(LazyTimestampObjectInspector.java:29) > at >
[jira] [Work logged] (HIVE-26373) ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp
[ https://issues.apache.org/jira/browse/HIVE-26373?focusedWorklogId=788627=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-788627 ] ASF GitHub Bot logged work on HIVE-26373: - Author: ASF GitHub Bot Created on: 07/Jul/22 13:27 Start Date: 07/Jul/22 13:27 Worklog Time Spent: 10m Work Description: zabetak commented on code in PR #3418: URL: https://github.com/apache/hive/pull/3418#discussion_r915873512 ## hbase-handler/src/test/results/positive/hbase_avro_nested_timestamp.q.out: ## @@ -0,0 +1,45 @@ +PREHOOK: query: CREATE EXTERNAL TABLE tbl( +`key` string COMMENT '', +`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>) +ROW FORMAT SERDE + 'org.apache.hadoop.hive.hbase.HBaseSerDe' +STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' +WITH SERDEPROPERTIES ( +'serialization.format'='1', +'hbase.columns.mapping' = ':key,data:frV4', +'data.frV4.serialization.type'='avro', + A masked pattern was here +) +TBLPROPERTIES ( +'hbase.table.name' = 'HiveAvroTable', +'hbase.struct.autogenerate'='true') +PREHOOK: type: CREATETABLE +PREHOOK: Output: database:default +PREHOOK: Output: default@tbl +POSTHOOK: query: CREATE EXTERNAL TABLE tbl( +`key` string COMMENT '', +`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>) +ROW FORMAT SERDE + 'org.apache.hadoop.hive.hbase.HBaseSerDe' +STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' +WITH SERDEPROPERTIES ( +'serialization.format'='1', +'hbase.columns.mapping' = ':key,data:frV4', +'data.frV4.serialization.type'='avro', + A masked pattern was here +) +TBLPROPERTIES ( +'hbase.table.name' = 'HiveAvroTable', +'hbase.struct.autogenerate'='true') +POSTHOOK: type: CREATETABLE +POSTHOOK: Output: database:default +POSTHOOK: Output: default@tbl +PREHOOK: query: select data_frV4.dischargedate.value from tbl +PREHOOK: type: QUERY +PREHOOK: Input: default@tbl + A masked pattern was here +POSTHOOK: query: select data_frV4.dischargedate.value from tbl +POSTHOOK: type: QUERY +POSTHOOK: Input: default@tbl + A masked pattern was here +1970-01-19 20:16:19.2 Review Comment: Copying here the offline follow-up by Soumyakanti: It's because `"logicalType": "timestamp-millis"` is defined in the avsc. I had to make this change ```java dateRecord.put("value", LocalDate.of(2022,7,5).atStartOfDay().atZone(ZoneOffset.UTC).toInstant().toEpochMilli()); ``` However, right now the result I am getting for this is: 2022-07-04 17:00:00 Issue Time Tracking --- Worklog Id: (was: 788627) Time Spent: 50m (was: 40m) > ClassCastException while inserting Avro data into Hbase table for nested > struct with Timestamp > -- > > Key: HIVE-26373 > URL: https://issues.apache.org/jira/browse/HIVE-26373 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > For Avro data where the schema has nested struct with a Timestamp datatype, > we get the following ClassCastException: > {code:java} > 2022-07-05T08:40:51,572 ERROR [LocalJobRunner Map Task Executor #0] > mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.common.type.Timestamp cannot be cast to > org.apache.hadoop.hive.serde2.lazy.LazyPrimitive > at >
[jira] [Work logged] (HIVE-26373) ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp
[ https://issues.apache.org/jira/browse/HIVE-26373?focusedWorklogId=788618=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-788618 ] ASF GitHub Bot logged work on HIVE-26373: - Author: ASF GitHub Bot Created on: 07/Jul/22 13:05 Start Date: 07/Jul/22 13:05 Worklog Time Spent: 10m Work Description: zabetak commented on PR #3418: URL: https://github.com/apache/hive/pull/3418#issuecomment-1177583388 Hive has been always converting data from local time zone to UTC when writing and from UTC to local time zone when reading. I updated the way the the timestamp is stored in HBase (https://github.com/apache/hive/pull/3418/commits/fc9bc94be427a02485b089c2aeb6b494644beb05) to make it coherent with the way it is read by the query. There are properties and Avro file metadata which can control if we want to perform the conversion or not (e.g., `hive.avro.timestamp.skip.conversion`) but these are not working at the moment for HBase (and basically anything that relies on `AvroLazyObjectInspector`). This is a bug that should be fixed but it is out of the scope of this PR. Issue Time Tracking --- Worklog Id: (was: 788618) Time Spent: 40m (was: 0.5h) > ClassCastException while inserting Avro data into Hbase table for nested > struct with Timestamp > -- > > Key: HIVE-26373 > URL: https://issues.apache.org/jira/browse/HIVE-26373 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > For Avro data where the schema has nested struct with a Timestamp datatype, > we get the following ClassCastException: > {code:java} > 2022-07-05T08:40:51,572 ERROR [LocalJobRunner Map Task Executor #0] > mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.common.type.Timestamp cannot be cast to > org.apache.hadoop.hive.serde2.lazy.LazyPrimitive > at > org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.AbstractPrimitiveLazyObjectInspector.getPrimitiveWritableObject(AbstractPrimitiveLazyObjectInspector.java:40) > at > org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyTimestampObjectInspector.getPrimitiveWritableObject(LazyTimestampObjectInspector.java:29) > at > org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:308) > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292) > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247) > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:231) > at > org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1059) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:128) > at > org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552) > ... 11 more {code} > The problem starts in {{toLazyObject}}
[jira] [Work logged] (HIVE-26373) ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp
[ https://issues.apache.org/jira/browse/HIVE-26373?focusedWorklogId=788365=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-788365 ] ASF GitHub Bot logged work on HIVE-26373: - Author: ASF GitHub Bot Created on: 06/Jul/22 18:44 Start Date: 06/Jul/22 18:44 Worklog Time Spent: 10m Work Description: soumyakanti3578 commented on code in PR #3418: URL: https://github.com/apache/hive/pull/3418#discussion_r915150677 ## hbase-handler/src/test/results/positive/hbase_avro_nested_timestamp.q.out: ## @@ -0,0 +1,45 @@ +PREHOOK: query: CREATE EXTERNAL TABLE tbl( +`key` string COMMENT '', +`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>) +ROW FORMAT SERDE + 'org.apache.hadoop.hive.hbase.HBaseSerDe' +STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' +WITH SERDEPROPERTIES ( +'serialization.format'='1', +'hbase.columns.mapping' = ':key,data:frV4', +'data.frV4.serialization.type'='avro', + A masked pattern was here +) +TBLPROPERTIES ( +'hbase.table.name' = 'HiveAvroTable', +'hbase.struct.autogenerate'='true') +PREHOOK: type: CREATETABLE +PREHOOK: Output: database:default +PREHOOK: Output: default@tbl +POSTHOOK: query: CREATE EXTERNAL TABLE tbl( +`key` string COMMENT '', +`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>) +ROW FORMAT SERDE + 'org.apache.hadoop.hive.hbase.HBaseSerDe' +STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' +WITH SERDEPROPERTIES ( +'serialization.format'='1', +'hbase.columns.mapping' = ':key,data:frV4', +'data.frV4.serialization.type'='avro', + A masked pattern was here +) +TBLPROPERTIES ( +'hbase.table.name' = 'HiveAvroTable', +'hbase.struct.autogenerate'='true') +POSTHOOK: type: CREATETABLE +POSTHOOK: Output: database:default +POSTHOOK: Output: default@tbl +PREHOOK: query: select data_frV4.dischargedate.value from tbl +PREHOOK: type: QUERY +PREHOOK: Input: default@tbl + A masked pattern was here +POSTHOOK: query: select data_frV4.dischargedate.value from tbl +POSTHOOK: type: QUERY +POSTHOOK: Input: default@tbl + A masked pattern was here +1970-01-19 20:16:19.2 Review Comment: `2022-07-05 00:00:00` = 1657004400 `1970-01-19 20:16:19.2` = 1657004 So somewhere we are clipping the last three digits. I will check. Issue Time Tracking --- Worklog Id: (was: 788365) Time Spent: 0.5h (was: 20m) > ClassCastException while inserting Avro data into Hbase table for nested > struct with Timestamp > -- > > Key: HIVE-26373 > URL: https://issues.apache.org/jira/browse/HIVE-26373 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > For Avro data where the schema has nested struct with a Timestamp datatype, > we get the following ClassCastException: > {code:java} > 2022-07-05T08:40:51,572 ERROR [LocalJobRunner Map Task Executor #0] > mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.common.type.Timestamp cannot be cast to > org.apache.hadoop.hive.serde2.lazy.LazyPrimitive > at > org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.AbstractPrimitiveLazyObjectInspector.getPrimitiveWritableObject(AbstractPrimitiveLazyObjectInspector.java:40) > at > org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyTimestampObjectInspector.getPrimitiveWritableObject(LazyTimestampObjectInspector.java:29) > at >
[jira] [Work logged] (HIVE-26373) ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp
[ https://issues.apache.org/jira/browse/HIVE-26373?focusedWorklogId=788361=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-788361 ] ASF GitHub Bot logged work on HIVE-26373: - Author: ASF GitHub Bot Created on: 06/Jul/22 18:22 Start Date: 06/Jul/22 18:22 Worklog Time Spent: 10m Work Description: zabetak commented on code in PR #3418: URL: https://github.com/apache/hive/pull/3418#discussion_r915124542 ## hbase-handler/src/test/results/positive/hbase_avro_nested_timestamp.q.out: ## @@ -0,0 +1,45 @@ +PREHOOK: query: CREATE EXTERNAL TABLE tbl( +`key` string COMMENT '', +`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>) +ROW FORMAT SERDE + 'org.apache.hadoop.hive.hbase.HBaseSerDe' +STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' +WITH SERDEPROPERTIES ( +'serialization.format'='1', +'hbase.columns.mapping' = ':key,data:frV4', +'data.frV4.serialization.type'='avro', + A masked pattern was here +) +TBLPROPERTIES ( +'hbase.table.name' = 'HiveAvroTable', +'hbase.struct.autogenerate'='true') +PREHOOK: type: CREATETABLE +PREHOOK: Output: database:default +PREHOOK: Output: default@tbl +POSTHOOK: query: CREATE EXTERNAL TABLE tbl( +`key` string COMMENT '', +`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>) +ROW FORMAT SERDE + 'org.apache.hadoop.hive.hbase.HBaseSerDe' +STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' +WITH SERDEPROPERTIES ( +'serialization.format'='1', +'hbase.columns.mapping' = ':key,data:frV4', +'data.frV4.serialization.type'='avro', + A masked pattern was here +) +TBLPROPERTIES ( +'hbase.table.name' = 'HiveAvroTable', +'hbase.struct.autogenerate'='true') +POSTHOOK: type: CREATETABLE +POSTHOOK: Output: database:default +POSTHOOK: Output: default@tbl +PREHOOK: query: select data_frV4.dischargedate.value from tbl +PREHOOK: type: QUERY +PREHOOK: Input: default@tbl + A masked pattern was here +POSTHOOK: query: select data_frV4.dischargedate.value from tbl +POSTHOOK: type: QUERY +POSTHOOK: Input: default@tbl + A masked pattern was here +1970-01-19 20:16:19.2 Review Comment: The result does not appear to be correct. I was expecting to see something like `2022-07-05 00:00:00`. Are we doing something wrong when inserting the data? Are we doing something wrong while fetching the data? ## itests/util/src/main/java/org/apache/hadoop/hive/hbase/HBaseTestSetup.java: ## @@ -158,6 +170,41 @@ private void createHBaseTable() throws IOException { } } + private byte[] createAvroRecordWithNestedTimestamp() throws IOException { +String dataDir = System.getProperty("test.data.dir"); +Schema schema = new Schema.Parser().parse(new File(dataDir+"/nested_ts.avsc")); Review Comment: This path concatenation may not be portable to different OS. ## itests/util/src/main/java/org/apache/hadoop/hive/hbase/HBaseTestSetup.java: ## @@ -158,6 +170,41 @@ private void createHBaseTable() throws IOException { } } + private byte[] createAvroRecordWithNestedTimestamp() throws IOException { +String dataDir = System.getProperty("test.data.dir"); +Schema schema = new Schema.Parser().parse(new File(dataDir+"/nested_ts.avsc")); Review Comment: Make method static. ## itests/util/src/main/java/org/apache/hadoop/hive/hbase/HBaseTestSetup.java: ## @@ -158,6 +170,41 @@ private void createHBaseTable() throws IOException { } } + private byte[] createAvroRecordWithNestedTimestamp() throws IOException { +String dataDir = System.getProperty("test.data.dir"); +Schema schema = new Schema.Parser().parse(new File(dataDir+"/nested_ts.avsc")); +GenericData.Record rootRecord = new GenericData.Record(schema); +rootRecord.put("id", "X338092"); +GenericData.Record dateRecord = new GenericData.Record(schema.getField("dischargedate").schema()); +dateRecord.put("value", LocalDate.of(2022,7,5).atStartOfDay().toEpochSecond(ZoneOffset.UTC)); +rootRecord.put("dischargedate", dateRecord); + +try (ByteArrayOutputStream out = new ByteArrayOutputStream()) { + try (DataFileWriter dataFileWriter + = new DataFileWriter(new GenericDatumWriter<>(schema))) { +dataFileWriter.create(schema, out); +dataFileWriter.append(rootRecord); + } + return out.toByteArray(); +} + } + + private void createAvroTable() throws IOException { +final String HBASE_TABLE_NAME = "HiveAvroTable"; +HTableDescriptor htableDesc = new HTableDescriptor(TableName.valueOf(HBASE_TABLE_NAME)); +htableDesc.addFamily(new HColumnDescriptor("data".getBytes())); + +try (Admin hbaseAdmin = hbaseConn.getAdmin()) { + hbaseAdmin.createTable(htableDesc); + try (Table table = hbaseConn.getTable(TableName.valueOf(HBASE_TABLE_NAME))) {
[jira] [Work logged] (HIVE-26373) ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp
[ https://issues.apache.org/jira/browse/HIVE-26373?focusedWorklogId=788355=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-788355 ] ASF GitHub Bot logged work on HIVE-26373: - Author: ASF GitHub Bot Created on: 06/Jul/22 17:40 Start Date: 06/Jul/22 17:40 Worklog Time Spent: 10m Work Description: soumyakanti3578 opened a new pull request, #3418: URL: https://github.com/apache/hive/pull/3418 ### What changes were proposed in this pull request? `isPrimitive` returns `true` for Timestamp ### Why are the changes needed? `isPrimitive` was returning `false` and because of that `Timestamp` type was not getting converted to `LazyTimestamp` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? `mvn test -Dtest=TestHBaseCliDriver -Dtest.output.overwrite=true -Dqfile=hbase_avro_nested_timestamp.q` Issue Time Tracking --- Worklog Id: (was: 788355) Remaining Estimate: 0h Time Spent: 10m > ClassCastException while inserting Avro data into Hbase table for nested > struct with Timestamp > -- > > Key: HIVE-26373 > URL: https://issues.apache.org/jira/browse/HIVE-26373 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > For Avro data where the schema has nested struct with a Timestamp datatype, > we get the following ClassCastException: > {code:java} > 2022-07-05T08:40:51,572 ERROR [LocalJobRunner Map Task Executor #0] > mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.common.type.Timestamp cannot be cast to > org.apache.hadoop.hive.serde2.lazy.LazyPrimitive > at > org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.AbstractPrimitiveLazyObjectInspector.getPrimitiveWritableObject(AbstractPrimitiveLazyObjectInspector.java:40) > at > org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyTimestampObjectInspector.getPrimitiveWritableObject(LazyTimestampObjectInspector.java:29) > at > org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:308) > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292) > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247) > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:231) > at > org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1059) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:128) > at > org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552) > ... 11 more {code} > The problem starts in {{toLazyObject}} method of > {*}AvroLazyObjectInspector.java{*}, when >