Thanks a lot for your mail Jungtaek. I added the StreamingQueryListener into my code (updated code <https://gist.github.com/kudhru/e1ce6b3f399c546be5eeb1f590087992>) and was able to see valid inputRowsPerSecond, processRowsPerSecond numbers. But it also shows zeros intermittently. Here is the sample output <https://gist.github.com/kudhru/db2bced789c528464620ae1767597127> Could you explain why is this the case? Unfortunately, the csv files still show zeros only except few non-zeros. Do you know why this may be happening? (I changed the metrics.properties to print every second instead of every 10 seconds). Here is the output of the metrics log file (run_latest.driver.spark.streaming.aggregates.inputRate-total.csv) t,value 1529645042,0.0 1529645043,0.0 1529645044,0.0 1529645045,NaN 1529645046,88967.97153024911 1529645047,100200.4008016032 1529645048,122100.12210012211 1529645049,0.0 1529645050,0.0 1529645051,0.0 1529645052,0.0 1529645053,0.0 1529645054,0.0 1529645055,0.0 1529645056,0.0 1529645057,0.0 1529645058,0.0 1529645059,0.0 1529645060,0.0 1529645061,0.0 1529645062,0.0 1529645063,0.0 1529645064,0.0 1529645065,0.0 1529645066,0.0 1529645067,0.0 1529645068,0.0 1529645069,0.0 1529645070,0.0 1529645071,0.0 1529645072,93808.63039399624 1529645073,0.0 1529645074,0.0 1529645075,0.0 1529645076,0.0 1529645077,0.0 1529645078,0.0
-------------------------------------------------- Dhruv Kumar PhD Candidate Department of Computer Science and Engineering University of Minnesota www.dhruvkumar.me > On Jun 21, 2018, at 23:07, Jungtaek Lim <kabh...@gmail.com> wrote: > > I'm referring to 2.4.0-SNAPSHOT (not sure which commit I'm referring) but it > properly returns the input rate. > > $ tail -F > /tmp/spark-trial-metric/local-1529640063554.driver.spark.streaming.counts.inputRate-total.csv > t,value > 1529640073,0.0 > 1529640083,0.9411272613196695 > 1529640093,0.9430996541967934 > 1529640103,1.0606060606060606 > 1529640113,0.9997000899730081 > > Could you add streaming query listener and see the value of sources -> > numInputRows, inputRowsPerSecond, processedRowsPerSecond? They should provide > some valid numbers. > > Thanks, > Jungtaek Lim (HeartSaVioR) > > 2018년 6월 22일 (금) 오전 11:49, Dhruv Kumar <gargdhru...@gmail.com > <mailto:gargdhru...@gmail.com>>님이 작성: > Hi > > I was trying to measure the performance metrics for spark structured > streaming. But I am unable to see any data in the metrics log files. My input > source is the Rate source > <https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#creating-streaming-dataframes-and-streaming-datasets> > which generates data at the specified number of rows per second. Here is the > link to my code > <https://gist.github.com/kudhru/e1ce6b3f399c546be5eeb1f590087992> and > metrics.properties > <https://gist.github.com/kudhru/5d8a8f4d53c766e9efad4de2ae9b82d6> file. > > When I run the above mentioned code using spark-submit, I see that the > metrics logs (for example, > run_1.driver.spark.streaming.aggregates.inputRate-total.csv) are created > under the specified directory but most of the values are 0. > Below is a portion of the inputeRate-total.csv file: > 1529634585,0.0 > 1529634595,0.0 > 1529634605,0.0 > 1529634615,0.0 > 1529634625,0.0 > 1529634635,0.0 > 1529634645,0.0 > 1529634655,0.0 > 1529634665,0.0 > 1529634675,0.0 > 1529634685,0.0 > 1529634695,0.0 > 1529634705,0.0 > 1529634715,0.0 > > Any reason as to why this must be happening? Happy to share more information > if that helps. > > Thanks > -------------------------------------------------- > Dhruv Kumar > PhD Candidate > Department of Computer Science and Engineering > University of Minnesota > www.dhruvkumar.me <http://www.dhruvkumar.me/>