[ 
https://issues.apache.org/jira/browse/HUDI-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tao meng updated HUDI-1662:
---------------------------
    Description: 
step1: prepare raw DataFrame with DateType, and insert it to HudiMorTable

df_raw.withColumn("date", lit(Date.valueOf("2020-11-10")))

merge(df_raw, "bulk_insert", "huditest.bulkinsert_mor_10g")

step2: prepare update DataFrame with DateType, and upsert into HudiMorTable

 df_update = sql("select * from 
huditest.bulkinsert_mor_10g_rt").withColumn("date", 
lit(Date.valueOf("2020-11-11")))

merge(df_update, "upsert", "huditest.bulkinsert_mor_10g")

 

step3: use hive-beeeline/ spark-sql query mor_rt table

use beeline/spark-sql   execute   statement select * from 
huditest.bulkinsert_mor_10g_rt where primary_key = 10000000;

then the follow error will occur:

_java.lang.ClassCastExceoption: org.apache.hadoop.io.IntWritable cannot be cast 
to org.apache.hadoop.hive.serde2.io.DateWritableV2_

 
 
Root cause analysis:

hudi use avro format to store log file, avro store DateType as INT(Type is INT 
but logcialType is date)。

when hudi read log file and convert avro INT type record to 
writable,logicalType is not respected which lead the dateType will cast to 
IntWritable。

seem: 
[https://github.com/apache/hudi/blob/master/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeRecordReaderUtils.java#L169]
 
Modification plan: when cast avro INT type  to writable,  logicalType must be  
considerd
if (schema.getLogicalType() != null && schema.getLogicalType().getName() == 
"date") {
 return new DateWritable((Integer) value);
} else {
 return new IntWritable((Integer) value);
}
 

 

 

 

 

  was:
step1: prepare raw DataFrame with DateType, and insert it to HudiMorTable

df_raw.withColumn("date", lit(Date.valueOf("2020-11-10")))

merge(df_raw, "bulk_insert", "huditest.bulkinsert_mor_10g")

step2: prepare update DataFrame with DateType, and upsert into HudiMorTable

 df_update = sql("select * from 
huditest.bulkinsert_mor_10g_rt").withColumn("date", 
lit(Date.valueOf("2020-11-11")))

merge(df_update, "upsert", "huditest.bulkinsert_mor_10g")

 

step3: use hive-beeeline/ spark-sql query mor_rt table

!image-2021-03-05-10-06-11-949.png!


>  Failed to query real-time view use hive/spark-sql when hudi mor table 
> contains dateType
> ----------------------------------------------------------------------------------------
>
>                 Key: HUDI-1662
>                 URL: https://issues.apache.org/jira/browse/HUDI-1662
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Hive Integration
>    Affects Versions: 0.7.0
>         Environment: hive 3.1.1
> spark 2.4.5
> hadoop 3.1.1
> suse os
>            Reporter: tao meng
>            Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> step1: prepare raw DataFrame with DateType, and insert it to HudiMorTable
> df_raw.withColumn("date", lit(Date.valueOf("2020-11-10")))
> merge(df_raw, "bulk_insert", "huditest.bulkinsert_mor_10g")
> step2: prepare update DataFrame with DateType, and upsert into HudiMorTable
>  df_update = sql("select * from 
> huditest.bulkinsert_mor_10g_rt").withColumn("date", 
> lit(Date.valueOf("2020-11-11")))
> merge(df_update, "upsert", "huditest.bulkinsert_mor_10g")
>  
> step3: use hive-beeeline/ spark-sql query mor_rt table
> use beeline/spark-sql   execute   statement select * from 
> huditest.bulkinsert_mor_10g_rt where primary_key = 10000000;
> then the follow error will occur:
> _java.lang.ClassCastExceoption: org.apache.hadoop.io.IntWritable cannot be 
> cast to org.apache.hadoop.hive.serde2.io.DateWritableV2_
>  
>  
> Root cause analysis:
> hudi use avro format to store log file, avro store DateType as INT(Type is 
> INT but logcialType is date)。
> when hudi read log file and convert avro INT type record to 
> writable,logicalType is not respected which lead the dateType will cast to 
> IntWritable。
> seem: 
> [https://github.com/apache/hudi/blob/master/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeRecordReaderUtils.java#L169]
>  
> Modification plan: when cast avro INT type  to writable,  logicalType must be 
>  considerd
> if (schema.getLogicalType() != null && schema.getLogicalType().getName() == 
> "date") {
>  return new DateWritable((Integer) value);
> } else {
>  return new IntWritable((Integer) value);
> }
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to