cdmikechen opened a new pull request #770: remove com.databricks:spark-avro to 
build spark avro schema to insert parquet.
URL: https://github.com/apache/incubator-hudi/pull/770
 
 
   Provide a way to let hoodie support `timestamp` and `decimal`.
   Change the type of timestamp from long to 
`int64`(logical_type=`timestamp-millis`).
   Change the type of date from int to `int32`(logical_type=`date`).
   Change the type of decimal from string to `fix`(logical_type=`decimal`).
   
   In spark, hoodi can correctly convert the all data type of the primitive 
into the parquet 
type(https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L372).
 
   In hive, hoodie can correctly convert the `decimal` type of the primitive 
into the parquet type, but only read `timestamp` as long(ParquetHiveSerDe can 
not read logical_type).
   
   Another things to mention: We need to replace avro*-1.7.7.jar in 
`SPARK_HOME/jars` to avro*-1.8.2.jar, so that spark can use logical type 
classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to