Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22905#discussion_r230249905 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -306,7 +306,15 @@ case class FileSourceScanExec( withOptPartitionCount } - withSelectedBucketsCount + val withOptColumnCount = relation.fileFormat match { + case columnar: ColumnarFileFormat => + val sqlConf = relation.sparkSession.sessionState.conf + val columnCount = columnar.columnCountForSchema(sqlConf, requiredSchema) + withSelectedBucketsCount + ("ColumnCount" -> columnCount.toString) --- End diff -- The purpose of this info is to check the number of columns actually selected, and that information can be shown via logging, no? Why should it be exposed in metadata then? Maybe debug logging that shows the number of columns that actually being selected via the underlying source.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org