[GitHub] spark pull request #19702: [SPARK-10365][SQL] Support Parquet logical type T...

cloud-fan Thu, 16 Nov 2017 15:52:56 -0800

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19702#discussion_r151570606
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala
 ---
    @@ -372,23 +381,18 @@ private[parquet] class ParquetSchemaConverter(
           // `TIMESTAMP_MICROS` which are both logical types annotating 
`INT64`.
           //
           // Originally, Spark SQL uses the same nanosecond timestamp type as 
Impala and Hive.  Starting
    -      // from Spark 1.5.0, we resort to a timestamp type with 100 ns 
precision so that we can store
    -      // a timestamp into a `Long`.  This design decision is subject to 
change though, for example,
    -      // we may resort to microsecond precision in the future.
    -      //
    -      // For Parquet, we plan to write all `TimestampType` value as 
`TIMESTAMP_MICROS`, but it's
    -      // currently not implemented yet because parquet-mr 1.8.1 (the 
version we're currently using)
    -      // hasn't implemented `TIMESTAMP_MICROS` yet, however it supports 
TIMESTAMP_MILLIS. We will
    -      // encode timestamp values as TIMESTAMP_MILLIS annotating INT64 if
    -      // 'spark.sql.parquet.int64AsTimestampMillis' is set.
    -      //
    -      // TODO Converts `TIMESTAMP_MICROS` once parquet-mr implements that.
    -
    -      case TimestampType if writeTimestampInMillis =>
    -        Types.primitive(INT64, 
repetition).as(TIMESTAMP_MILLIS).named(field.name)
    -
    +      // from Spark 1.5.0, we resort to a timestamp type with microsecond 
precision so that we can
    --- End diff --
    
    I don't think Spark ever have 100ns precision, cc @liancheng to confirm.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19702: [SPARK-10365][SQL] Support Parquet logical type T...

Reply via email to