[ 
https://issues.apache.org/jira/browse/HIVE-26373?focusedWorklogId=788361&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-788361
 ]

ASF GitHub Bot logged work on HIVE-26373:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 06/Jul/22 18:22
            Start Date: 06/Jul/22 18:22
    Worklog Time Spent: 10m 
      Work Description: zabetak commented on code in PR #3418:
URL: https://github.com/apache/hive/pull/3418#discussion_r915124542


##########
hbase-handler/src/test/results/positive/hbase_avro_nested_timestamp.q.out:
##########
@@ -0,0 +1,45 @@
+PREHOOK: query: CREATE EXTERNAL TABLE tbl(
+`key` string COMMENT '',
+`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>)
+ROW FORMAT SERDE
+  'org.apache.hadoop.hive.hbase.HBaseSerDe'
+STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
+WITH SERDEPROPERTIES (
+'serialization.format'='1',
+'hbase.columns.mapping' = ':key,data:frV4',
+'data.frV4.serialization.type'='avro',
+#### A masked pattern was here ####
+)
+TBLPROPERTIES (
+'hbase.table.name' = 'HiveAvroTable',
+'hbase.struct.autogenerate'='true')
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@tbl
+POSTHOOK: query: CREATE EXTERNAL TABLE tbl(
+`key` string COMMENT '',
+`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>)
+ROW FORMAT SERDE
+  'org.apache.hadoop.hive.hbase.HBaseSerDe'
+STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
+WITH SERDEPROPERTIES (
+'serialization.format'='1',
+'hbase.columns.mapping' = ':key,data:frV4',
+'data.frV4.serialization.type'='avro',
+#### A masked pattern was here ####
+)
+TBLPROPERTIES (
+'hbase.table.name' = 'HiveAvroTable',
+'hbase.struct.autogenerate'='true')
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@tbl
+PREHOOK: query: select data_frV4.dischargedate.value from tbl
+PREHOOK: type: QUERY
+PREHOOK: Input: default@tbl
+#### A masked pattern was here ####
+POSTHOOK: query: select data_frV4.dischargedate.value from tbl
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@tbl
+#### A masked pattern was here ####
+1970-01-19 20:16:19.2

Review Comment:
   The result does not appear to be correct. I was expecting to see something 
like `2022-07-05 00:00:00`. 
   
   Are we doing something wrong when inserting the data? Are we doing something 
wrong while fetching the data?



##########
itests/util/src/main/java/org/apache/hadoop/hive/hbase/HBaseTestSetup.java:
##########
@@ -158,6 +170,41 @@ private void createHBaseTable() throws IOException {
     }
   }
 
+  private byte[] createAvroRecordWithNestedTimestamp() throws IOException {
+    String dataDir = System.getProperty("test.data.dir");
+    Schema schema = new Schema.Parser().parse(new 
File(dataDir+"/nested_ts.avsc"));

Review Comment:
   This path concatenation may not be portable to different OS.



##########
itests/util/src/main/java/org/apache/hadoop/hive/hbase/HBaseTestSetup.java:
##########
@@ -158,6 +170,41 @@ private void createHBaseTable() throws IOException {
     }
   }
 
+  private byte[] createAvroRecordWithNestedTimestamp() throws IOException {
+    String dataDir = System.getProperty("test.data.dir");
+    Schema schema = new Schema.Parser().parse(new 
File(dataDir+"/nested_ts.avsc"));

Review Comment:
   Make method static.



##########
itests/util/src/main/java/org/apache/hadoop/hive/hbase/HBaseTestSetup.java:
##########
@@ -158,6 +170,41 @@ private void createHBaseTable() throws IOException {
     }
   }
 
+  private byte[] createAvroRecordWithNestedTimestamp() throws IOException {
+    String dataDir = System.getProperty("test.data.dir");
+    Schema schema = new Schema.Parser().parse(new 
File(dataDir+"/nested_ts.avsc"));
+    GenericData.Record rootRecord = new GenericData.Record(schema);
+    rootRecord.put("id", "X338092");
+    GenericData.Record dateRecord = new 
GenericData.Record(schema.getField("dischargedate").schema());
+    dateRecord.put("value", 
LocalDate.of(2022,7,5).atStartOfDay().toEpochSecond(ZoneOffset.UTC));
+    rootRecord.put("dischargedate", dateRecord);
+
+    try (ByteArrayOutputStream out = new ByteArrayOutputStream()) {
+      try (DataFileWriter<GenericRecord> dataFileWriter
+             = new DataFileWriter<GenericRecord>(new 
GenericDatumWriter<>(schema))) {
+        dataFileWriter.create(schema, out);
+        dataFileWriter.append(rootRecord);
+      }
+      return out.toByteArray();
+    }
+  }
+
+  private void createAvroTable() throws IOException {
+    final String HBASE_TABLE_NAME = "HiveAvroTable";
+    HTableDescriptor htableDesc = new 
HTableDescriptor(TableName.valueOf(HBASE_TABLE_NAME));
+    htableDesc.addFamily(new HColumnDescriptor("data".getBytes()));
+
+    try (Admin hbaseAdmin = hbaseConn.getAdmin()) {
+      hbaseAdmin.createTable(htableDesc);
+      try (Table table = 
hbaseConn.getTable(TableName.valueOf(HBASE_TABLE_NAME))) {

Review Comment:
   Extract `TableName.valueOf("HiveAvroTable")` to local variable and replace 
occurrences.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 788361)
    Time Spent: 20m  (was: 10m)

> ClassCastException while inserting Avro data into Hbase table for nested 
> struct with Timestamp
> ----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-26373
>                 URL: https://issues.apache.org/jira/browse/HIVE-26373
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Soumyakanti Das
>            Assignee: Soumyakanti Das
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> For Avro data where the schema has nested struct with a Timestamp datatype, 
> we get the following ClassCastException:
> {code:java}
> 2022-07-05T08:40:51,572 ERROR [LocalJobRunner Map Task Executor #0] 
> mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
>         at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573)
>         at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>         at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37)
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.type.Timestamp cannot be cast to 
> org.apache.hadoop.hive.serde2.lazy.LazyPrimitive
>         at 
> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.AbstractPrimitiveLazyObjectInspector.getPrimitiveWritableObject(AbstractPrimitiveLazyObjectInspector.java:40)
>         at 
> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyTimestampObjectInspector.getPrimitiveWritableObject(LazyTimestampObjectInspector.java:29)
>         at 
> org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:308)
>         at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292)
>         at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247)
>         at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:231)
>         at 
> org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55)
>         at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1059)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937)
>         at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937)
>         at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:128)
>         at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152)
>         at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552)
>         ... 11 more {code}
> The problem starts in {{toLazyObject}} method of 
> {*}AvroLazyObjectInspector.java{*}, when 
> [this|https://github.com/apache/hive/blob/53009126f6fe7ccf24cf052fd6c156542f38b19d/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroLazyObjectInspector.java#L347]
>  condition returns false for {*}Timestamp{*}, preventing the conversion of 
> *Timestamp* to *LazyTimestamp* 
> [here|https://github.com/apache/hive/blob/53009126f6fe7ccf24cf052fd6c156542f38b19d/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java#L132].
> The solution is to return {{true}} for Timestamps in the {{isPrimitive}} 
> method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to