Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20831#discussion_r175980243 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -169,7 +174,10 @@ case class InMemoryTableScanExec( override def outputOrdering: Seq[SortOrder] = relation.child.outputOrdering.map(updateAttribute(_).asInstanceOf[SortOrder]) - private def statsFor(a: Attribute) = relation.partitionStatistics.forAttribute(a) + // When we make canonicalized plan, we can't find a normalized attribute in this map. + // We return a `ColumnStatisticsSchema` for normalized attribute in this case. --- End diff -- I've tried that at beginning. However, `partitionFilters` uses `buildFilter`. Making `partitionFilters` a lazy doesn't work because when do `copy`, the initialization of `InMemoryTableScanExec` will try to materialize `partitionFilters` for coping it value. Making `partitionFilters`, `buildFilter` as methods is not enough too, we also need to remove `@transient` from `relation` and `InMemoryRelation.partitionStatistics`. So I think it isn't worth and leave it as is.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org