I think the read is very about each engine because Hoodie does not define
its own parquet reader yet, for e.g the Flink reader can read int96 as
timestamp based on the declared precision.

Best,
Danny Chan

lrz <[email protected]> 于2021年4月1日周四 下午12:04写道:

> Hi, I want to discuss about the support for timestamp dataType.
> As we know, now Hudi save timestamp type as long, then this will lead to
> some problem when the table include timestamp datatype:
> 1) At bootstrap operation, if the origin parquet file was written by a
> spark application, then spark will default save timestamp as int96(see
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check
> https://github <https://github/>.com/apache/parquet-mr/pull/831/files)
>
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to
> read origin parquet file. The schema is not match because hoodie schema
> treat timestamp as long and at origin file it’s Int96
>
> 3) after bootstrap, and partial update for a parquet file will fail,
> because we copy the old record and save by hoodie schema( we miss a
> convertFixedToLong operation like spark does)
>
> 4) if we set hoodie.datasource.hive_sync.support_timestamp=true, and will
> had a convertTypeException when reading the rt view, it’s because we miss
> convert LongWritable to ItmestampWrtableV2 at
> HoodieRealtimeRecordReaderUtils.
>
> To solve these issues, we need to upgrade parquet version, and add some
> config, please help to get a good solution, thank you very much!

Reply via email to