cdmikechen opened a new pull request #770: remove com.databricks:spark-avro to build spark avro schema to insert parquet. URL: https://github.com/apache/incubator-hudi/pull/770 Provide a way to let hoodie support `timestamp` and `decimal`. Change the type of timestamp from long to `int64`(logical_type=`timestamp-millis`). Change the type of date from int to `int32`(logical_type=`date`). Change the type of decimal from string to `fix`(logical_type=`decimal`). In spark, hoodi can correctly convert the all data type of the primitive into the parquet type(https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L372). In hive, hoodie can correctly convert the `decimal` type of the primitive into the parquet type, but only read `timestamp` as long(ParquetHiveSerDe can not read logical_type). Another things to mention: We need to replace avro*-1.7.7.jar in `SPARK_HOME/jars` to avro*-1.8.2.jar, so that spark can use logical type classes.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
