Hi,

I'm wondering why are the metrics repeated in FileSourceScanExec.metrics
[1] since it is a ColumnarBatchScan [2] and so inherits the two
metrics numOutputRows and scanTime from ColumnarBatchScan.metrics [3].

Shouldn't FileSourceScanExec.metrics be as follows then:

  override lazy val metrics = super.metrics ++ Map(
    "numFiles" -> SQLMetrics.createMetric(sparkContext, "number of files"),
    "metadataTime" -> SQLMetrics.createMetric(sparkContext, "metadata time
(ms)"))

I'd like to send a pull request with a fix if no one objects. Anyone?

[1]
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L315-L319
[2]
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L164
[3]
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarBatchScan.scala#L38-L40

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
Follow me at https://twitter.com/jaceklaskowski

Reply via email to