zuyanton opened a new issue #2509: URL: https://github.com/apache/hudi/issues/2509
**Describe the problem you faced** It looks like org.apache.spark.sql.types.TimestampType when saved to hudi table gets converted to bigInt **To Reproduce** create dataframe with TimestampType ``` var seq = Seq((1, "2020-01-01 11:22:30", 2, 2)) var df = seq.toDF("pk", "time_string" , "partition", "sort_key") df= df.withColumn("timestamp", col("time_string").cast(TimestampType)) ``` preview dataframe ``` df.show ``` ``` +---+-------------------+---------+--------+-------------------+ | pk| time_string|partition|sort_key| timestamp| +---+-------------------+---------+--------+-------------------+ | 1|2020-01-01 11:22:30| 2| 2|2020-01-01 11:22:30| +---+-------------------+---------+--------+-------------------+ ``` save dataframe to hudi table ``` df.write.format("org.apache.hudi").options(hudiOptions).mode(SaveMode.Append).save("s3://location") ``` view hudi table ``` spark.sql("select * from testTable2").show ``` result, timestamp column is bigint ``` +-------------------+--------------------+------------------+----------------------+--------------------+---+-------------------+--------+----------------+---------+ |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path| _hoodie_file_name| pk| time_string|sort_key| timestamp|partition| +-------------------+--------------------+------------------+----------------------+--------------------+---+-------------------+--------+----------------+---------+ | 20210201004527| 20210201004527_0_1| pk:1| 2|2972ef96-279b-438...| 1|2020-01-01 11:22:30| 2|1577877750000000| 2| +-------------------+--------------------+------------------+----------------------+--------------------+---+-------------------+--------+----------------+---------+ ``` view schema ``` spark.sql("describe testTable2").show ``` result ``` +--------------------+---------+-------+ | col_name|data_type|comment| +--------------------+---------+-------+ | _hoodie_commit_time| string| null| |_hoodie_commit_seqno| string| null| | _hoodie_record_key| string| null| |_hoodie_partition...| string| null| | _hoodie_file_name| string| null| | pk| int| null| | time_string| string| null| | sort_key| int| null| | timestamp| bigint| null| | partition| int| null| |# Partition Infor...| | | | # col_name|data_type|comment| | partition| int| null| +--------------------+---------+-------+ ``` **Environment Description** * Hudi version : 0.7.0 * Spark version : * Hive version : * Hadoop version : * Storage (HDFS/S3/GCS..) :S3 * Running on Docker? (yes/no) : no **Additional context** full code snippet ``` import org.apache.spark.sql.functions._ import org.apache.spark.sql.types._ import org.apache.hudi.hive.MultiPartKeysValueExtractor import org.apache.hudi.QuickstartUtils._ import scala.collection.JavaConversions._ import org.apache.spark.sql.SaveMode import org.apache.hudi.DataSourceReadOptions._ import org.apache.hudi.DataSourceWriteOptions._ import org.apache.hudi.DataSourceWriteOptions import org.apache.hudi.config.HoodieWriteConfig._ import org.apache.hudi.config.HoodieWriteConfig import org.apache.hudi.keygen.ComplexKeyGenerator import org.apache.hudi.common.model.DefaultHoodieRecordPayload import org.apache.hadoop.hive.conf.HiveConf val hiveConf = new HiveConf() val hiveMetastoreURI = hiveConf.get("hive.metastore.uris").replaceAll("thrift://", "") val hiveServer2URI = hiveMetastoreURI.substring(0, hiveMetastoreURI.lastIndexOf(":")) var hudiOptions = Map[String,String]( HoodieWriteConfig.TABLE_NAME → "testTable2", "hoodie.consistency.check.enabled"->"true", DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY -> "COPY_ON_WRITE", DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> "pk", DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY -> classOf[ComplexKeyGenerator].getName, DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY ->"partition", DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> "sort_key", DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY → "true", DataSourceWriteOptions.HIVE_TABLE_OPT_KEY → "testTable2", DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY → "partition", DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY → classOf[MultiPartKeysValueExtractor].getName, DataSourceWriteOptions.HIVE_URL_OPT_KEY ->s"jdbc:hive2://$hiveServer2URI:10000", "hoodie.payload.ordering.field" -> "sort_key", DataSourceWriteOptions.PAYLOAD_CLASS_OPT_KEY -> classOf[DefaultHoodieRecordPayload].getName ) //spark.sql("drop table if exists testTable1") var seq = Seq((1, "2020-01-01 11:22:30", 2, 2)) var df = seq.toDF("pk", "time_string" , "partition", "sort_key") df= df.withColumn("timestamp", col("time_string").cast(TimestampType)) df.show df.write.format("org.apache.hudi").options(hudiOptions).mode(SaveMode.Append).save("s3://location") spark.sql("select * from testTable2").show ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org