[GitHub] spark pull request #21086: [SPARK-24002] [SQL] Task not serializable caused ...

cloud-fan Tue, 15 May 2018 22:08:40 -0700

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21086#discussion_r188504187
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
 ---
    @@ -351,12 +338,26 @@ class ParquetFileFormat
         val timestampConversion: Boolean =
           sparkSession.sessionState.conf.isParquetINT96TimestampConversion
         val capacity = sqlConf.parquetVectorizedReaderBatchSize
    +    val enableParquetFilterPushDown: Boolean =
    +      sparkSession.sessionState.conf.parquetFilterPushDown
         // Whole stage codegen (PhysicalRDD) is able to deal with batches 
directly
         val returningBatch = supportBatch(sparkSession, resultSchema)
     
         (file: PartitionedFile) => {
           assert(file.partitionValues.numFields == partitionSchema.size)
     
    +      // Try to push down filters when filter push-down is enabled.
    --- End diff --
    
    Now the code is inside the read function, which will be executed at 
executor side. Thus we don't need to serialize `ParquetFilters`.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21086: [SPARK-24002] [SQL] Task not serializable caused ...

Reply via email to