[ https://issues.apache.org/jira/browse/HUDI-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
tao meng updated HUDI-1662: --------------------------- Description: step1: prepare raw DataFrame with DateType, and insert it to HudiMorTable df_raw.withColumn("date", lit(Date.valueOf("2020-11-10"))) merge(df_raw, "bulk_insert", "huditest.bulkinsert_mor_10g") step2: prepare update DataFrame with DateType, and upsert into HudiMorTable df_update = sql("select * from huditest.bulkinsert_mor_10g_rt").withColumn("date", lit(Date.valueOf("2020-11-11"))) merge(df_update, "upsert", "huditest.bulkinsert_mor_10g") step3: use hive-beeeline/ spark-sql query mor_rt table use beeline/spark-sql execute statement select * from huditest.bulkinsert_mor_10g_rt where primary_key = 10000000; then the follow error will occur: _java.lang.ClassCastExceoption: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DateWritableV2_ Root cause analysis: hudi use avro format to store log file, avro store DateType as INT(Type is INT but logcialType is date)。 when hudi read log file and convert avro INT type record to writable,logicalType is not respected which lead the dateType will cast to IntWritable。 seem: [https://github.com/apache/hudi/blob/master/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeRecordReaderUtils.java#L169] Modification plan: when cast avro INT type to writable, logicalType must be considerd if (schema.getLogicalType() != null && schema.getLogicalType().getName() == "date") { return new DateWritable((Integer) value); } else { return new IntWritable((Integer) value); } was: step1: prepare raw DataFrame with DateType, and insert it to HudiMorTable df_raw.withColumn("date", lit(Date.valueOf("2020-11-10"))) merge(df_raw, "bulk_insert", "huditest.bulkinsert_mor_10g") step2: prepare update DataFrame with DateType, and upsert into HudiMorTable df_update = sql("select * from huditest.bulkinsert_mor_10g_rt").withColumn("date", lit(Date.valueOf("2020-11-11"))) merge(df_update, "upsert", "huditest.bulkinsert_mor_10g") step3: use hive-beeeline/ spark-sql query mor_rt table !image-2021-03-05-10-06-11-949.png! > Failed to query real-time view use hive/spark-sql when hudi mor table > contains dateType > ---------------------------------------------------------------------------------------- > > Key: HUDI-1662 > URL: https://issues.apache.org/jira/browse/HUDI-1662 > Project: Apache Hudi > Issue Type: Bug > Components: Hive Integration > Affects Versions: 0.7.0 > Environment: hive 3.1.1 > spark 2.4.5 > hadoop 3.1.1 > suse os > Reporter: tao meng > Priority: Major > Original Estimate: 24h > Remaining Estimate: 24h > > step1: prepare raw DataFrame with DateType, and insert it to HudiMorTable > df_raw.withColumn("date", lit(Date.valueOf("2020-11-10"))) > merge(df_raw, "bulk_insert", "huditest.bulkinsert_mor_10g") > step2: prepare update DataFrame with DateType, and upsert into HudiMorTable > df_update = sql("select * from > huditest.bulkinsert_mor_10g_rt").withColumn("date", > lit(Date.valueOf("2020-11-11"))) > merge(df_update, "upsert", "huditest.bulkinsert_mor_10g") > > step3: use hive-beeeline/ spark-sql query mor_rt table > use beeline/spark-sql execute statement select * from > huditest.bulkinsert_mor_10g_rt where primary_key = 10000000; > then the follow error will occur: > _java.lang.ClassCastExceoption: org.apache.hadoop.io.IntWritable cannot be > cast to org.apache.hadoop.hive.serde2.io.DateWritableV2_ > > > Root cause analysis: > hudi use avro format to store log file, avro store DateType as INT(Type is > INT but logcialType is date)。 > when hudi read log file and convert avro INT type record to > writable,logicalType is not respected which lead the dateType will cast to > IntWritable。 > seem: > [https://github.com/apache/hudi/blob/master/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeRecordReaderUtils.java#L169] > > Modification plan: when cast avro INT type to writable, logicalType must be > considerd > if (schema.getLogicalType() != null && schema.getLogicalType().getName() == > "date") { > return new DateWritable((Integer) value); > } else { > return new IntWritable((Integer) value); > } > > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)