[jira] [Work logged] (HIVE-26373) ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp

2022-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26373?focusedWorklogId=788932=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-788932
 ]

ASF GitHub Bot logged work on HIVE-26373:
-

Author: ASF GitHub Bot
Created on: 08/Jul/22 10:40
Start Date: 08/Jul/22 10:40
Worklog Time Spent: 10m 
  Work Description: zabetak closed pull request #3418: HIVE-26373: 
ClassCastException while inserting Avro data into Hbase table for nested struct 
with Timestamp
URL: https://github.com/apache/hive/pull/3418




Issue Time Tracking
---

Worklog Id: (was: 788932)
Time Spent: 1h 10m  (was: 1h)

> ClassCastException while inserting Avro data into Hbase table for nested 
> struct with Timestamp
> --
>
> Key: HIVE-26373
> URL: https://issues.apache.org/jira/browse/HIVE-26373
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> For Avro data where the schema has nested struct with a Timestamp datatype, 
> we get the following ClassCastException:
> {code:java}
> 2022-07-05T08:40:51,572 ERROR [LocalJobRunner Map Task Executor #0] 
> mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.type.Timestamp cannot be cast to 
> org.apache.hadoop.hive.serde2.lazy.LazyPrimitive
> at 
> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.AbstractPrimitiveLazyObjectInspector.getPrimitiveWritableObject(AbstractPrimitiveLazyObjectInspector.java:40)
> at 
> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyTimestampObjectInspector.getPrimitiveWritableObject(LazyTimestampObjectInspector.java:29)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:308)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:231)
> at 
> org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1059)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:128)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552)
> ... 11 more {code}
> The problem starts in {{toLazyObject}} method of 
> {*}AvroLazyObjectInspector.java{*}, when 
> [this|https://github.com/apache/hive/blob/53009126f6fe7ccf24cf052fd6c156542f38b19d/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroLazyObjectInspector.java#L347]
>  condition returns false for {*}Timestamp{*}, preventing the conversion of 
> *Timestamp* to *LazyTimestamp* 
> [here|https://github.com/apache/hive/blob/53009126f6fe7ccf24cf052fd6c156542f38b19d/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java#L132].
> The solution is to return {{true}} for Timestamps in the {{isPrimitive}} 
> method.



--
This message 

[jira] [Work logged] (HIVE-26373) ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp

2022-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26373?focusedWorklogId=788628=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-788628
 ]

ASF GitHub Bot logged work on HIVE-26373:
-

Author: ASF GitHub Bot
Created on: 07/Jul/22 13:29
Start Date: 07/Jul/22 13:29
Worklog Time Spent: 10m 
  Work Description: zabetak commented on code in PR #3418:
URL: https://github.com/apache/hive/pull/3418#discussion_r915874809


##
hbase-handler/src/test/results/positive/hbase_avro_nested_timestamp.q.out:
##
@@ -0,0 +1,45 @@
+PREHOOK: query: CREATE EXTERNAL TABLE tbl(
+`key` string COMMENT '',
+`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>)
+ROW FORMAT SERDE
+  'org.apache.hadoop.hive.hbase.HBaseSerDe'
+STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
+WITH SERDEPROPERTIES (
+'serialization.format'='1',
+'hbase.columns.mapping' = ':key,data:frV4',
+'data.frV4.serialization.type'='avro',
+ A masked pattern was here 
+)
+TBLPROPERTIES (
+'hbase.table.name' = 'HiveAvroTable',
+'hbase.struct.autogenerate'='true')
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@tbl
+POSTHOOK: query: CREATE EXTERNAL TABLE tbl(
+`key` string COMMENT '',
+`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>)
+ROW FORMAT SERDE
+  'org.apache.hadoop.hive.hbase.HBaseSerDe'
+STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
+WITH SERDEPROPERTIES (
+'serialization.format'='1',
+'hbase.columns.mapping' = ':key,data:frV4',
+'data.frV4.serialization.type'='avro',
+ A masked pattern was here 
+)
+TBLPROPERTIES (
+'hbase.table.name' = 'HiveAvroTable',
+'hbase.struct.autogenerate'='true')
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@tbl
+PREHOOK: query: select data_frV4.dischargedate.value from tbl
+PREHOOK: type: QUERY
+PREHOOK: Input: default@tbl
+ A masked pattern was here 
+POSTHOOK: query: select data_frV4.dischargedate.value from tbl
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@tbl
+ A masked pattern was here 
+1970-01-19 20:16:19.2

Review Comment:
   The resolution of this is discussed here: 
https://github.com/apache/hive/pull/3418#issuecomment-1177583388





Issue Time Tracking
---

Worklog Id: (was: 788628)
Time Spent: 1h  (was: 50m)

> ClassCastException while inserting Avro data into Hbase table for nested 
> struct with Timestamp
> --
>
> Key: HIVE-26373
> URL: https://issues.apache.org/jira/browse/HIVE-26373
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> For Avro data where the schema has nested struct with a Timestamp datatype, 
> we get the following ClassCastException:
> {code:java}
> 2022-07-05T08:40:51,572 ERROR [LocalJobRunner Map Task Executor #0] 
> mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.type.Timestamp cannot be cast to 
> org.apache.hadoop.hive.serde2.lazy.LazyPrimitive
> at 
> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.AbstractPrimitiveLazyObjectInspector.getPrimitiveWritableObject(AbstractPrimitiveLazyObjectInspector.java:40)
> at 
> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyTimestampObjectInspector.getPrimitiveWritableObject(LazyTimestampObjectInspector.java:29)
> at 
> 

[jira] [Work logged] (HIVE-26373) ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp

2022-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26373?focusedWorklogId=788627=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-788627
 ]

ASF GitHub Bot logged work on HIVE-26373:
-

Author: ASF GitHub Bot
Created on: 07/Jul/22 13:27
Start Date: 07/Jul/22 13:27
Worklog Time Spent: 10m 
  Work Description: zabetak commented on code in PR #3418:
URL: https://github.com/apache/hive/pull/3418#discussion_r915873512


##
hbase-handler/src/test/results/positive/hbase_avro_nested_timestamp.q.out:
##
@@ -0,0 +1,45 @@
+PREHOOK: query: CREATE EXTERNAL TABLE tbl(
+`key` string COMMENT '',
+`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>)
+ROW FORMAT SERDE
+  'org.apache.hadoop.hive.hbase.HBaseSerDe'
+STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
+WITH SERDEPROPERTIES (
+'serialization.format'='1',
+'hbase.columns.mapping' = ':key,data:frV4',
+'data.frV4.serialization.type'='avro',
+ A masked pattern was here 
+)
+TBLPROPERTIES (
+'hbase.table.name' = 'HiveAvroTable',
+'hbase.struct.autogenerate'='true')
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@tbl
+POSTHOOK: query: CREATE EXTERNAL TABLE tbl(
+`key` string COMMENT '',
+`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>)
+ROW FORMAT SERDE
+  'org.apache.hadoop.hive.hbase.HBaseSerDe'
+STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
+WITH SERDEPROPERTIES (
+'serialization.format'='1',
+'hbase.columns.mapping' = ':key,data:frV4',
+'data.frV4.serialization.type'='avro',
+ A masked pattern was here 
+)
+TBLPROPERTIES (
+'hbase.table.name' = 'HiveAvroTable',
+'hbase.struct.autogenerate'='true')
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@tbl
+PREHOOK: query: select data_frV4.dischargedate.value from tbl
+PREHOOK: type: QUERY
+PREHOOK: Input: default@tbl
+ A masked pattern was here 
+POSTHOOK: query: select data_frV4.dischargedate.value from tbl
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@tbl
+ A masked pattern was here 
+1970-01-19 20:16:19.2

Review Comment:
   Copying here the offline follow-up by Soumyakanti:
   
   It's because `"logicalType": "timestamp-millis"` is defined in the avsc.
   
   I had to make this change 
   ```java
   dateRecord.put("value", 
LocalDate.of(2022,7,5).atStartOfDay().atZone(ZoneOffset.UTC).toInstant().toEpochMilli());
   ```
   
   However, right now the result I am getting for this is: 2022-07-04 17:00:00





Issue Time Tracking
---

Worklog Id: (was: 788627)
Time Spent: 50m  (was: 40m)

> ClassCastException while inserting Avro data into Hbase table for nested 
> struct with Timestamp
> --
>
> Key: HIVE-26373
> URL: https://issues.apache.org/jira/browse/HIVE-26373
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> For Avro data where the schema has nested struct with a Timestamp datatype, 
> we get the following ClassCastException:
> {code:java}
> 2022-07-05T08:40:51,572 ERROR [LocalJobRunner Map Task Executor #0] 
> mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.type.Timestamp cannot be cast to 
> org.apache.hadoop.hive.serde2.lazy.LazyPrimitive
> at 
> 

[jira] [Work logged] (HIVE-26373) ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp

2022-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26373?focusedWorklogId=788618=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-788618
 ]

ASF GitHub Bot logged work on HIVE-26373:
-

Author: ASF GitHub Bot
Created on: 07/Jul/22 13:05
Start Date: 07/Jul/22 13:05
Worklog Time Spent: 10m 
  Work Description: zabetak commented on PR #3418:
URL: https://github.com/apache/hive/pull/3418#issuecomment-1177583388

   Hive has been always converting data from local time zone to UTC when 
writing and from UTC to local time zone when reading. I updated the way the the 
timestamp is stored in HBase 
(https://github.com/apache/hive/pull/3418/commits/fc9bc94be427a02485b089c2aeb6b494644beb05)
 to make it coherent with the way it is read by the query. 
   
   There are properties and Avro file metadata which can control if we want to 
perform the conversion or not (e.g., `hive.avro.timestamp.skip.conversion`) but 
these are not working at the moment for HBase (and basically anything that 
relies on `AvroLazyObjectInspector`). This is a bug that should be fixed but it 
is out of the scope of this PR.




Issue Time Tracking
---

Worklog Id: (was: 788618)
Time Spent: 40m  (was: 0.5h)

> ClassCastException while inserting Avro data into Hbase table for nested 
> struct with Timestamp
> --
>
> Key: HIVE-26373
> URL: https://issues.apache.org/jira/browse/HIVE-26373
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> For Avro data where the schema has nested struct with a Timestamp datatype, 
> we get the following ClassCastException:
> {code:java}
> 2022-07-05T08:40:51,572 ERROR [LocalJobRunner Map Task Executor #0] 
> mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.type.Timestamp cannot be cast to 
> org.apache.hadoop.hive.serde2.lazy.LazyPrimitive
> at 
> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.AbstractPrimitiveLazyObjectInspector.getPrimitiveWritableObject(AbstractPrimitiveLazyObjectInspector.java:40)
> at 
> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyTimestampObjectInspector.getPrimitiveWritableObject(LazyTimestampObjectInspector.java:29)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:308)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:231)
> at 
> org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1059)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:128)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552)
> ... 11 more {code}
> The problem starts in {{toLazyObject}} 

[jira] [Work logged] (HIVE-26373) ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp

2022-07-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26373?focusedWorklogId=788365=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-788365
 ]

ASF GitHub Bot logged work on HIVE-26373:
-

Author: ASF GitHub Bot
Created on: 06/Jul/22 18:44
Start Date: 06/Jul/22 18:44
Worklog Time Spent: 10m 
  Work Description: soumyakanti3578 commented on code in PR #3418:
URL: https://github.com/apache/hive/pull/3418#discussion_r915150677


##
hbase-handler/src/test/results/positive/hbase_avro_nested_timestamp.q.out:
##
@@ -0,0 +1,45 @@
+PREHOOK: query: CREATE EXTERNAL TABLE tbl(
+`key` string COMMENT '',
+`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>)
+ROW FORMAT SERDE
+  'org.apache.hadoop.hive.hbase.HBaseSerDe'
+STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
+WITH SERDEPROPERTIES (
+'serialization.format'='1',
+'hbase.columns.mapping' = ':key,data:frV4',
+'data.frV4.serialization.type'='avro',
+ A masked pattern was here 
+)
+TBLPROPERTIES (
+'hbase.table.name' = 'HiveAvroTable',
+'hbase.struct.autogenerate'='true')
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@tbl
+POSTHOOK: query: CREATE EXTERNAL TABLE tbl(
+`key` string COMMENT '',
+`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>)
+ROW FORMAT SERDE
+  'org.apache.hadoop.hive.hbase.HBaseSerDe'
+STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
+WITH SERDEPROPERTIES (
+'serialization.format'='1',
+'hbase.columns.mapping' = ':key,data:frV4',
+'data.frV4.serialization.type'='avro',
+ A masked pattern was here 
+)
+TBLPROPERTIES (
+'hbase.table.name' = 'HiveAvroTable',
+'hbase.struct.autogenerate'='true')
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@tbl
+PREHOOK: query: select data_frV4.dischargedate.value from tbl
+PREHOOK: type: QUERY
+PREHOOK: Input: default@tbl
+ A masked pattern was here 
+POSTHOOK: query: select data_frV4.dischargedate.value from tbl
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@tbl
+ A masked pattern was here 
+1970-01-19 20:16:19.2

Review Comment:
   `2022-07-05 00:00:00` = 1657004400
   `1970-01-19 20:16:19.2` = 1657004
   So somewhere we are clipping the last three digits. I will check.





Issue Time Tracking
---

Worklog Id: (was: 788365)
Time Spent: 0.5h  (was: 20m)

> ClassCastException while inserting Avro data into Hbase table for nested 
> struct with Timestamp
> --
>
> Key: HIVE-26373
> URL: https://issues.apache.org/jira/browse/HIVE-26373
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For Avro data where the schema has nested struct with a Timestamp datatype, 
> we get the following ClassCastException:
> {code:java}
> 2022-07-05T08:40:51,572 ERROR [LocalJobRunner Map Task Executor #0] 
> mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.type.Timestamp cannot be cast to 
> org.apache.hadoop.hive.serde2.lazy.LazyPrimitive
> at 
> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.AbstractPrimitiveLazyObjectInspector.getPrimitiveWritableObject(AbstractPrimitiveLazyObjectInspector.java:40)
> at 
> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyTimestampObjectInspector.getPrimitiveWritableObject(LazyTimestampObjectInspector.java:29)
> at 
> 

[jira] [Work logged] (HIVE-26373) ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp

2022-07-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26373?focusedWorklogId=788361=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-788361
 ]

ASF GitHub Bot logged work on HIVE-26373:
-

Author: ASF GitHub Bot
Created on: 06/Jul/22 18:22
Start Date: 06/Jul/22 18:22
Worklog Time Spent: 10m 
  Work Description: zabetak commented on code in PR #3418:
URL: https://github.com/apache/hive/pull/3418#discussion_r915124542


##
hbase-handler/src/test/results/positive/hbase_avro_nested_timestamp.q.out:
##
@@ -0,0 +1,45 @@
+PREHOOK: query: CREATE EXTERNAL TABLE tbl(
+`key` string COMMENT '',
+`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>)
+ROW FORMAT SERDE
+  'org.apache.hadoop.hive.hbase.HBaseSerDe'
+STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
+WITH SERDEPROPERTIES (
+'serialization.format'='1',
+'hbase.columns.mapping' = ':key,data:frV4',
+'data.frV4.serialization.type'='avro',
+ A masked pattern was here 
+)
+TBLPROPERTIES (
+'hbase.table.name' = 'HiveAvroTable',
+'hbase.struct.autogenerate'='true')
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@tbl
+POSTHOOK: query: CREATE EXTERNAL TABLE tbl(
+`key` string COMMENT '',
+`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>)
+ROW FORMAT SERDE
+  'org.apache.hadoop.hive.hbase.HBaseSerDe'
+STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
+WITH SERDEPROPERTIES (
+'serialization.format'='1',
+'hbase.columns.mapping' = ':key,data:frV4',
+'data.frV4.serialization.type'='avro',
+ A masked pattern was here 
+)
+TBLPROPERTIES (
+'hbase.table.name' = 'HiveAvroTable',
+'hbase.struct.autogenerate'='true')
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@tbl
+PREHOOK: query: select data_frV4.dischargedate.value from tbl
+PREHOOK: type: QUERY
+PREHOOK: Input: default@tbl
+ A masked pattern was here 
+POSTHOOK: query: select data_frV4.dischargedate.value from tbl
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@tbl
+ A masked pattern was here 
+1970-01-19 20:16:19.2

Review Comment:
   The result does not appear to be correct. I was expecting to see something 
like `2022-07-05 00:00:00`. 
   
   Are we doing something wrong when inserting the data? Are we doing something 
wrong while fetching the data?



##
itests/util/src/main/java/org/apache/hadoop/hive/hbase/HBaseTestSetup.java:
##
@@ -158,6 +170,41 @@ private void createHBaseTable() throws IOException {
 }
   }
 
+  private byte[] createAvroRecordWithNestedTimestamp() throws IOException {
+String dataDir = System.getProperty("test.data.dir");
+Schema schema = new Schema.Parser().parse(new 
File(dataDir+"/nested_ts.avsc"));

Review Comment:
   This path concatenation may not be portable to different OS.



##
itests/util/src/main/java/org/apache/hadoop/hive/hbase/HBaseTestSetup.java:
##
@@ -158,6 +170,41 @@ private void createHBaseTable() throws IOException {
 }
   }
 
+  private byte[] createAvroRecordWithNestedTimestamp() throws IOException {
+String dataDir = System.getProperty("test.data.dir");
+Schema schema = new Schema.Parser().parse(new 
File(dataDir+"/nested_ts.avsc"));

Review Comment:
   Make method static.



##
itests/util/src/main/java/org/apache/hadoop/hive/hbase/HBaseTestSetup.java:
##
@@ -158,6 +170,41 @@ private void createHBaseTable() throws IOException {
 }
   }
 
+  private byte[] createAvroRecordWithNestedTimestamp() throws IOException {
+String dataDir = System.getProperty("test.data.dir");
+Schema schema = new Schema.Parser().parse(new 
File(dataDir+"/nested_ts.avsc"));
+GenericData.Record rootRecord = new GenericData.Record(schema);
+rootRecord.put("id", "X338092");
+GenericData.Record dateRecord = new 
GenericData.Record(schema.getField("dischargedate").schema());
+dateRecord.put("value", 
LocalDate.of(2022,7,5).atStartOfDay().toEpochSecond(ZoneOffset.UTC));
+rootRecord.put("dischargedate", dateRecord);
+
+try (ByteArrayOutputStream out = new ByteArrayOutputStream()) {
+  try (DataFileWriter dataFileWriter
+ = new DataFileWriter(new 
GenericDatumWriter<>(schema))) {
+dataFileWriter.create(schema, out);
+dataFileWriter.append(rootRecord);
+  }
+  return out.toByteArray();
+}
+  }
+
+  private void createAvroTable() throws IOException {
+final String HBASE_TABLE_NAME = "HiveAvroTable";
+HTableDescriptor htableDesc = new 
HTableDescriptor(TableName.valueOf(HBASE_TABLE_NAME));
+htableDesc.addFamily(new HColumnDescriptor("data".getBytes()));
+
+try (Admin hbaseAdmin = hbaseConn.getAdmin()) {
+  hbaseAdmin.createTable(htableDesc);
+  try (Table table = 
hbaseConn.getTable(TableName.valueOf(HBASE_TABLE_NAME))) {


[jira] [Work logged] (HIVE-26373) ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp

2022-07-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26373?focusedWorklogId=788355=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-788355
 ]

ASF GitHub Bot logged work on HIVE-26373:
-

Author: ASF GitHub Bot
Created on: 06/Jul/22 17:40
Start Date: 06/Jul/22 17:40
Worklog Time Spent: 10m 
  Work Description: soumyakanti3578 opened a new pull request, #3418:
URL: https://github.com/apache/hive/pull/3418

   
   ### What changes were proposed in this pull request?
   `isPrimitive` returns `true` for Timestamp
   
   
   ### Why are the changes needed?
   `isPrimitive` was returning `false` and because of that `Timestamp` type was 
not getting converted to `LazyTimestamp`
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   `mvn test -Dtest=TestHBaseCliDriver -Dtest.output.overwrite=true 
-Dqfile=hbase_avro_nested_timestamp.q`
   




Issue Time Tracking
---

Worklog Id: (was: 788355)
Remaining Estimate: 0h
Time Spent: 10m

> ClassCastException while inserting Avro data into Hbase table for nested 
> struct with Timestamp
> --
>
> Key: HIVE-26373
> URL: https://issues.apache.org/jira/browse/HIVE-26373
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For Avro data where the schema has nested struct with a Timestamp datatype, 
> we get the following ClassCastException:
> {code:java}
> 2022-07-05T08:40:51,572 ERROR [LocalJobRunner Map Task Executor #0] 
> mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.type.Timestamp cannot be cast to 
> org.apache.hadoop.hive.serde2.lazy.LazyPrimitive
> at 
> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.AbstractPrimitiveLazyObjectInspector.getPrimitiveWritableObject(AbstractPrimitiveLazyObjectInspector.java:40)
> at 
> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyTimestampObjectInspector.getPrimitiveWritableObject(LazyTimestampObjectInspector.java:29)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:308)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:231)
> at 
> org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1059)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:128)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552)
> ... 11 more {code}
> The problem starts in {{toLazyObject}} method of 
> {*}AvroLazyObjectInspector.java{*}, when 
>