[GitHub] spark pull request #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReade...

dongjoon-hyun Sun, 13 May 2018 14:17:29 -0700

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21295#discussion_r187813544
  
    --- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java
 ---
    @@ -147,7 +147,8 @@ public void initialize(InputSplit inputSplit, 
TaskAttemptContext taskAttemptCont
         this.sparkSchema = 
StructType$.MODULE$.fromString(sparkRequestedSchemaString);
         this.reader = new ParquetFileReader(
             configuration, footer.getFileMetaData(), file, blocks, 
requestedSchema.getColumns());
    -    for (BlockMetaData block : blocks) {
    +    // use the blocks from the reader in case some do not match filters 
and will not be read
    --- End diff --
    
    What I mean is `this patch is logically okay, but only valid for `master` 
branch, Spark 2.4 with Parquet 1.10.0`. For example, the test case will pass in 
`branch-2.3` without this patch because it uses Parquet 1.8.X. As you 
mentioned, it would be great if we had included this patch in #21070.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReade...

Reply via email to