Hi,

How can I measure the time an RDD takes to execute?

In particular, I want to do it for the following piece of code:
«

val ssc = new StreamingContext(sparkConf, Seconds(5))
val distFile = ssc.textFileStream("/home/myuser/twitter-dump")

val words = distFile.flatMap(_.split(" ")).filter(_.length > 3)

val wordCharValues = words.map(word => {
  var sum = 0
  word.toCharArray.foreach(sum += _.toInt)
  val value = sum.toDouble / word.length.toDouble
  val average = 1892.162961
  (math.pow(value - average, 2), 1)
})
  .reduceByWindow({ case ((sum1, count1), (sum2, count2)) => (sum1 +
sum2, count1 + count2)}, Seconds(10), Seconds(10))

wordCharValues.foreachRDD(rdd => {
  val result = rdd.take(1)

  println("Result array size: " + result.size)
  if(result.size > 0)
    println("STDEV: %f".format(math.sqrt(result(0)._1.toDouble /
result(0)._2.toDouble)))
})

»

Thanks.

Reply via email to