Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22905#discussion_r230249905
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala 
---
    @@ -306,7 +306,15 @@ case class FileSourceScanExec(
           withOptPartitionCount
         }
     
    -    withSelectedBucketsCount
    +    val withOptColumnCount = relation.fileFormat match {
    +      case columnar: ColumnarFileFormat =>
    +        val sqlConf = relation.sparkSession.sessionState.conf
    +        val columnCount = columnar.columnCountForSchema(sqlConf, 
requiredSchema)
    +        withSelectedBucketsCount + ("ColumnCount" -> columnCount.toString)
    --- End diff --
    
    The purpose of this info is to check the number of columns actually 
selected, and that information can be shown via logging, no? Why should it be 
exposed in metadata then?
    
    Maybe debug logging that shows the number of columns that actually being 
selected via the underlying source.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to