[ 
https://issues.apache.org/jira/browse/HUDI-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1662:
---------------------------------
    Labels: pull-request-available  (was: )

>  Failed to query real-time view use hive/spark-sql when hudi mor table 
> contains dateType
> ----------------------------------------------------------------------------------------
>
>                 Key: HUDI-1662
>                 URL: https://issues.apache.org/jira/browse/HUDI-1662
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Hive Integration
>    Affects Versions: 0.7.0
>         Environment: hive 3.1.1
> spark 2.4.5
> hadoop 3.1.1
> suse os
>            Reporter: tao meng
>            Priority: Major
>              Labels: pull-request-available
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> step1: prepare raw DataFrame with DateType, and insert it to HudiMorTable
> df_raw.withColumn("date", lit(Date.valueOf("2020-11-10")))
> merge(df_raw, "bulk_insert", "huditest.bulkinsert_mor_10g")
> step2: prepare update DataFrame with DateType, and upsert into HudiMorTable
>  df_update = sql("select * from 
> huditest.bulkinsert_mor_10g_rt").withColumn("date", 
> lit(Date.valueOf("2020-11-11")))
> merge(df_update, "upsert", "huditest.bulkinsert_mor_10g")
>  
> step3: use hive-beeeline/ spark-sql query mor_rt table
> use beeline/spark-sql   execute   statement select * from 
> huditest.bulkinsert_mor_10g_rt where primary_key = 10000000;
> then the follow error will occur:
> _java.lang.ClassCastExceoption: org.apache.hadoop.io.IntWritable cannot be 
> cast to org.apache.hadoop.hive.serde2.io.DateWritableV2_
>  
>   
>  Root cause analysis:
> hudi use avro format to store log file, avro store DateType as INT(Type is 
> INT but logcialType is date)。
> when hudi read log file and convert avro INT type record to 
> writable,logicalType is not respected which lead the dateType will cast to 
> IntWritable。
> seem: 
> [https://github.com/apache/hudi/blob/master/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeRecordReaderUtils.java#L169]
>   
>  Modification plan: when cast avro INT type  to writable,  logicalType must 
> be  considerd
> case INT:
>  if (schema.getLogicalType() != null && 
> schema.getLogicalType().getName().equals("date")) {
>  return new DateWritable((Integer) value);
>  } else {
>  return new IntWritable((Integer) value);
>  }
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to