[GitHub] spark pull request #21224: [SPARK-24167][SQL] ParquetFilters should not acce...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/21224#discussion_r185988219 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -342,6 +342,7 @@ class ParquetFileFormat sparkSession.sessionState.conf.parquetFilterPushDown // Whole stage codegen (PhysicalRDD) is able to deal with batches directly val returningBatch = supportBatch(sparkSession, resultSchema) +val pushDownDate = sqlConf.parquetFilterPushDownDate --- End diff -- Ah, I see. Thank you, @cloud-fan ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21224: [SPARK-24167][SQL] ParquetFilters should not acce...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21224 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21224: [SPARK-24167][SQL] ParquetFilters should not acce...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21224#discussion_r185975883 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -342,6 +342,7 @@ class ParquetFileFormat sparkSession.sessionState.conf.parquetFilterPushDown // Whole stage codegen (PhysicalRDD) is able to deal with batches directly val returningBatch = supportBatch(sparkSession, resultSchema) +val pushDownDate = sqlConf.parquetFilterPushDownDate --- End diff -- no we can't, see https://github.com/apache/spark/pull/21086 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21224: [SPARK-24167][SQL] ParquetFilters should not acce...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/21224#discussion_r185876764 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -342,6 +342,7 @@ class ParquetFileFormat sparkSession.sessionState.conf.parquetFilterPushDown // Whole stage codegen (PhysicalRDD) is able to deal with batches directly val returningBatch = supportBatch(sparkSession, resultSchema) +val pushDownDate = sqlConf.parquetFilterPushDownDate --- End diff -- Can we pass `pushed` instead of declaring new `pushDownDate`? The following can be handled at line 345 here. ```scala // Try to push down filters when filter push-down is enabled. val pushed = if (enableParquetFilterPushDown) { filters // Collects all converted Parquet filter predicates. Notice that not all predicates can be // converted (`ParquetFilters.createFilter` returns an `Option`). That's why a `flatMap` // is used here. .flatMap(new ParquetFilters(pushDownDate).createFilter(requiredSchema, _)) .reduceOption(FilterApi.and) } else { None } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21224: [SPARK-24167][SQL] ParquetFilters should not acce...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/21224 [SPARK-24167][SQL] ParquetFilters should not access SQLConf at executor side ## What changes were proposed in this pull request? This PR is extracted from #21190 , to make it easier to backport. `ParquetFilters` is used in the file scan function, which is executed in executor side, so we can't can't call `conf.parquetFilterPushDownDate` there. ## How was this patch tested? it's tested in #21190 You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark minor2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21224.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21224 commit c58baad051259d7d2d54f1eb5e84c4bdac0867a6 Author: Wenchen FanDate: 2018-05-03T05:20:06Z ParquetFilters should not access SQLConf at executor side --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org