Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22905#discussion_r232486218
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala 
---
    @@ -306,7 +306,15 @@ case class FileSourceScanExec(
           withOptPartitionCount
         }
     
    -    withSelectedBucketsCount
    +    val withOptColumnCount = relation.fileFormat match {
    +      case columnar: ColumnarFileFormat =>
    +        val sqlConf = relation.sparkSession.sessionState.conf
    +        val columnCount = columnar.columnCountForSchema(sqlConf, 
requiredSchema)
    +        withSelectedBucketsCount + ("ColumnCount" -> columnCount.toString)
    --- End diff --
    
    1. The column pruning is now specific for Parquet .. it's source specific 
for now. 2. I really think it's more appropriate to check if something as 
expected or not by logging.
    
    > That's speaking from experience, not conjecture.
    
    I am not underestimating your statement. Let's be very clear why it should 
be put in metadata over logging. How and why it can be useful? in what cases?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to