Re: [Spark Structured Streaming] Measure metrics from CsvSink for Rate source

2018-06-28 Thread Dhruv Kumar
Hi Can some one please take a look at below? Any help is deeply appreciated. -- Dhruv Kumar PhD Candidate Department of Computer Science and Engineering University of Minnesota www.dhruvkumar.me > On Jun 22, 2018, at 13:12, Dhruv Kumar wrote: >

Re: [Spark Structured Streaming] Measure metrics from CsvSink for Rate source

2018-06-22 Thread Dhruv Kumar
I actually tried the File Source (reading CSV files as stream and processing them). File source seems to be generating valid numbers in the metrics log files. I may be wrong but seems like an issue with the Rate source generating metrics in the metrics log files.

Re: [Spark Structured Streaming] Measure metrics from CsvSink for Rate source

2018-06-21 Thread Dhruv Kumar
Thanks a lot for your mail Jungtaek. I added the StreamingQueryListener into my code (updated code ) and was able to see valid inputRowsPerSecond, processRowsPerSecond numbers. But it also shows zeros intermittently. Here is the

Re: [Spark Structured Streaming] Measure metrics from CsvSink for Rate source

2018-06-21 Thread Jungtaek Lim
I'm referring to 2.4.0-SNAPSHOT (not sure which commit I'm referring) but it properly returns the input rate. $ tail -F /tmp/spark-trial-metric/local-1529640063554.driver.spark.streaming.counts.inputRate-total.csv t,value 1529640073,0.0 1529640083,0.9411272613196695 1529640093,0.9430996541967934

[Spark Structured Streaming] Measure metrics from CsvSink for Rate source

2018-06-21 Thread Dhruv Kumar
Hi I was trying to measure the performance metrics for spark structured streaming. But I am unable to see any data in the metrics log files. My input source is the Rate source