Github user ash211 commented on a diff in the pull request: https://github.com/apache/spark/pull/19702#discussion_r151569727 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala --- @@ -372,23 +381,18 @@ private[parquet] class ParquetSchemaConverter( // `TIMESTAMP_MICROS` which are both logical types annotating `INT64`. // // Originally, Spark SQL uses the same nanosecond timestamp type as Impala and Hive. Starting - // from Spark 1.5.0, we resort to a timestamp type with 100 ns precision so that we can store - // a timestamp into a `Long`. This design decision is subject to change though, for example, - // we may resort to microsecond precision in the future. - // - // For Parquet, we plan to write all `TimestampType` value as `TIMESTAMP_MICROS`, but it's - // currently not implemented yet because parquet-mr 1.8.1 (the version we're currently using) - // hasn't implemented `TIMESTAMP_MICROS` yet, however it supports TIMESTAMP_MILLIS. We will - // encode timestamp values as TIMESTAMP_MILLIS annotating INT64 if - // 'spark.sql.parquet.int64AsTimestampMillis' is set. - // - // TODO Converts `TIMESTAMP_MICROS` once parquet-mr implements that. - - case TimestampType if writeTimestampInMillis => - Types.primitive(INT64, repetition).as(TIMESTAMP_MILLIS).named(field.name) - + // from Spark 1.5.0, we resort to a timestamp type with microsecond precision so that we can --- End diff -- I think this comment change loses some historical context about behavior around 1.5.0. From 1.5.0 to 2.2.0 timestamps were with 100ns precision, and now with this change, starting in 2.3.0 they are microsecond precision, right? The change as written seems to retroactively change the described behavior, unless comment was wrong
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org