Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/21975#discussion_r207542400 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java --- @@ -174,13 +174,15 @@ void readBatch(int total, WritableColumnVector column) throws IOException { // TIMESTAMP_MILLIS encoded as INT64 can't be lazily decoded as we need to post process // the values to add microseconds precision. + PrimitiveType.PrimitiveTypeName typeName = + descriptor.getPrimitiveType().getPrimitiveTypeName(); if (column.hasDictionary() || (rowId == 0 && - (descriptor.getType() == PrimitiveType.PrimitiveTypeName.INT32 || - (descriptor.getType() == PrimitiveType.PrimitiveTypeName.INT64 && + (typeName == PrimitiveType.PrimitiveTypeName.INT32 || --- End diff -- Same comment about Parquet, though I remain a little uneasy about it. I just know it's a little more possible for different older Parquet to come in at runtime, and if we're changing to use a newer field that doesn't exist in older versions, could break. That said, I only didn't touch it because I didn't investigate. I also think that if older Parquet is on the classpath we have bigger problems, potentially. Especially if the newer field/method has existed for several versions, this seems OK.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org