Hi All,

           I tried to run a simple spark program to find out the metrics
collected while executing the program. What I observed is, I'm able to get
TaskMetrics.inputMetrics data like records read, bytesread etc. But I do
not get any metrics about the output.


I ran the below code in local mode as well as on a YARN Cluster, yet the
result is the same.

This is the  code I used to test.

val datadf = sqlcontext.read.json("schema.txt").repartition(10)

datadf.distinct().write.mode(SaveMode.Append).save("resources\\data-" +
Calendar.getInstance.getTimeInMillis)


I have attached the listener like this
 sc.addSparkListener(new SparkListener() {
      override def onTaskEnd(taskEnd: SparkListenerTaskEnd) {
        val metrics = taskEnd.taskMetrics
        if (metrics.inputMetrics != None) {
          println("Metrics for the task Input recs:" +
metrics.inputMetrics.get.recordsRead)
          inputRecords += metrics.inputMetrics.get.recordsRead
        }
        if (metrics.outputMetrics != None) {
          println("Metrics for the task Output recs:" +
metrics.inputMetrics.get.recordsRead)
          outputWritten += metrics.outputMetrics.get.recordsWritten
        }
      }
    })


I get valid data for the input metrics. But none for output metrics.

Am I missing anything here. Any pointers, much appreciated.

Reply via email to