Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22353#discussion_r217165648 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlanInfo.scala --- @@ -59,6 +57,12 @@ private[execution] object SparkPlanInfo { new SQLMetricInfo(metric.name.getOrElse(key), metric.id, metric.metricType) } - new SparkPlanInfo(plan.nodeName, plan.simpleString, children.map(fromSparkPlan), metrics) + // dump the file scan metadata (e.g file path) to event log --- End diff -- As a next step of reviews, did you have a chance to test this on your real environment at least TPCDS 1TB? This seems to increase the event log traffic dramatically in the worst case. Can we have some comparison before and after this PR? @LantaoJin .
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org