Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22905#discussion_r232486218 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -306,7 +306,15 @@ case class FileSourceScanExec( withOptPartitionCount } - withSelectedBucketsCount + val withOptColumnCount = relation.fileFormat match { + case columnar: ColumnarFileFormat => + val sqlConf = relation.sparkSession.sessionState.conf + val columnCount = columnar.columnCountForSchema(sqlConf, requiredSchema) + withSelectedBucketsCount + ("ColumnCount" -> columnCount.toString) --- End diff -- 1. The column pruning is now specific for Parquet .. it's source specific for now. 2. I really think it's more appropriate to check if something as expected or not by logging. > That's speaking from experience, not conjecture. I am not underestimating your statement. Let's be very clear why it should be put in metadata over logging. How and why it can be useful? in what cases?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org