Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21975#discussion_r207542400
  
    --- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
 ---
    @@ -174,13 +174,15 @@ void readBatch(int total, WritableColumnVector 
column) throws IOException {
     
             // TIMESTAMP_MILLIS encoded as INT64 can't be lazily decoded as we 
need to post process
             // the values to add microseconds precision.
    +        PrimitiveType.PrimitiveTypeName typeName =
    +          descriptor.getPrimitiveType().getPrimitiveTypeName();
             if (column.hasDictionary() || (rowId == 0 &&
    -            (descriptor.getType() == PrimitiveType.PrimitiveTypeName.INT32 
||
    -            (descriptor.getType() == PrimitiveType.PrimitiveTypeName.INT64 
 &&
    +            (typeName == PrimitiveType.PrimitiveTypeName.INT32 ||
    --- End diff --
    
    Same comment about Parquet, though I remain a little uneasy about it. I 
just know it's a little more possible for different older Parquet to come in at 
runtime, and if we're changing to use a newer field that doesn't exist in older 
versions, could break. That said, I only didn't touch it because I didn't 
investigate. I also think that if older Parquet is on the classpath we have 
bigger problems, potentially. Especially if the newer field/method has existed 
for several versions, this seems OK.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to