GitHub user rdblue opened a pull request: https://github.com/apache/spark/pull/20988
[SPARK-23877][SQL]: Use filter predicates to prune partitions in metadata-only queries ## What changes were proposed in this pull request? This updates the OptimizeMetadataOnlyQuery rule to use filter expressions when listing partitions, if there are filter nodes in the logical plan. This avoids listing all partitions for large tables on the driver. This also fixes a minor bug where the partitions returned from fsRelation cannot be serialized without hitting a stack level too deep error. This is caused by serializing a stream to executors, where the stream is a recursive structure. If the stream is too long, the serialization stack reaches the maximum level of depth. The fix is to create a LocalRelation using an Array instead of the incoming Seq. ## How was this patch tested? Existing tests for metadata-only queries. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rdblue/spark SPARK-23877-metadata-only-push-filters Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20988.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20988 ---- commit 552efaee05b64f9ed4d5496b3b1d11b57b985f85 Author: Ryan Blue <blue@...> Date: 2018-03-14T21:50:11Z Support filter conditions in metadata-only queries. commit 2345896288828aefe14ebcb370d374b348400cf4 Author: Ryan Blue <blue@...> Date: 2018-03-14T22:43:56Z Ensure partition data is an Array. The LocalRelation created for partition data for metadata-only queries may be a stream produced by listing directories. If the stream is large, serializing the LocalRelation to executors results in a stack overflow because the stream is a recursive structure of (head, rest-of-stream). ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org