Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22905#discussion_r232486141
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala 
---
    @@ -306,7 +306,15 @@ case class FileSourceScanExec(
           withOptPartitionCount
         }
     
    -    withSelectedBucketsCount
    +    val withOptColumnCount = relation.fileFormat match {
    +      case columnar: ColumnarFileFormat =>
    +        val sqlConf = relation.sparkSession.sessionState.conf
    +        val columnCount = columnar.columnCountForSchema(sqlConf, 
requiredSchema)
    +        withSelectedBucketsCount + ("ColumnCount" -> columnCount.toString)
    --- End diff --
    
    > Who wants that? If someone wants to put metadata somewhere in the 
physical plan, let them open a PR and make a case for it
    
    No .. I don't think we should add it only because it's requested once. They 
look same instances to me. I will have no argument if this one is added and 
other people request to add others later. We should make it clear why this one 
should be specifically added. We're not going to add all the information to 
metadata as requested.
    
    If the purpose of adding it is to check if the pushing down is actually 
working or not, the logging sounds appropriate for its purpose.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to