Hi, How can I measure the time an RDD takes to execute?
In particular, I want to do it for the following piece of code: « val ssc = new StreamingContext(sparkConf, Seconds(5)) val distFile = ssc.textFileStream("/home/myuser/twitter-dump") val words = distFile.flatMap(_.split(" ")).filter(_.length > 3) val wordCharValues = words.map(word => { var sum = 0 word.toCharArray.foreach(sum += _.toInt) val value = sum.toDouble / word.length.toDouble val average = 1892.162961 (math.pow(value - average, 2), 1) }) .reduceByWindow({ case ((sum1, count1), (sum2, count2)) => (sum1 + sum2, count1 + count2)}, Seconds(10), Seconds(10)) wordCharValues.foreachRDD(rdd => { val result = rdd.take(1) println("Result array size: " + result.size) if(result.size > 0) println("STDEV: %f".format(math.sqrt(result(0)._1.toDouble / result(0)._2.toDouble))) }) » Thanks.