[GitHub] [spark] beliefer commented on a change in pull request #34495: [SPARK-36182][SQL] Support TimestampNTZ type in Parquet data source

GitBox Sun, 07 Nov 2021 03:39:16 -0800


beliefer commented on a change in pull request #34495:
URL: https://github.com/apache/spark/pull/34495#discussion_r744243576




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
##########
@@ -370,6 +370,31 @@ private[parquet] class ParquetRowConverter(
           }
         }
 
+      // The converter doesn't support the TimestampLTZ Parquet type and 
TimestampNTZ Catalyst type.
+      // This is to avoid mistakes in reading the timestamp values.
+      case TimestampNTZType
+        if parquetType.asPrimitiveType().getPrimitiveTypeName == INT64 &&
+          
parquetType.getLogicalTypeAnnotation.isInstanceOf[TimestampLogicalTypeAnnotation]
 &&
+          !parquetType.getLogicalTypeAnnotation
+            .asInstanceOf[TimestampLogicalTypeAnnotation].isAdjustedToUTC &&
+          parquetType.getLogicalTypeAnnotation
+            .asInstanceOf[TimestampLogicalTypeAnnotation].getUnit == 
TimeUnit.MICROS =>
+        new ParquetPrimitiveConverter(updater)
+
+      case TimestampNTZType
+        if parquetType.asPrimitiveType().getPrimitiveTypeName == INT64 &&
+          
parquetType.getLogicalTypeAnnotation.isInstanceOf[TimestampLogicalTypeAnnotation]
 &&
+          !parquetType.getLogicalTypeAnnotation
+            .asInstanceOf[TimestampLogicalTypeAnnotation].isAdjustedToUTC &&
+          parquetType.getLogicalTypeAnnotation
+            .asInstanceOf[TimestampLogicalTypeAnnotation].getUnit == 
TimeUnit.MILLIS =>

Review comment:
       The condition seems the same as above，could you extract them？

##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetWriteSupport.scala
##########
@@ -223,6 +223,12 @@ class ParquetWriteSupport extends 
WriteSupport[InternalRow] with Logging {
               recordConsumer.addLong(millis)
         }
 
+      case TimestampNTZType =>
+        // For TimestampNTZType column, Spark always output as INT64 with 
Timestamp annotation in
+        // MICROS time unit.
+        (row: SpecializedGetters, ordinal: Int) =>
+          recordConsumer.addLong(row.getLong(ordinal))

Review comment:
       (row: SpecializedGetters, ordinal: Int) => 
recordConsumer.addLong(row.getLong(ordinal))




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #34495: [SPARK-36182][SQL] Support TimestampNTZ type in Parquet data source

Reply via email to