Thanks a lot for your mail Jungtaek. I added the StreamingQueryListener into my 
code (updated code 
<https://gist.github.com/kudhru/e1ce6b3f399c546be5eeb1f590087992>) and was able 
to see valid inputRowsPerSecond, processRowsPerSecond numbers. But it also 
shows zeros intermittently. Here is the sample output 
<https://gist.github.com/kudhru/db2bced789c528464620ae1767597127>  Could you 
explain why is this the case? 
Unfortunately, the csv files still show zeros only except few non-zeros. Do you 
know why this may be happening? (I changed the metrics.properties to print 
every second instead of every 10 seconds). 
Here is the output of the metrics log file 
(run_latest.driver.spark.streaming.aggregates.inputRate-total.csv)
t,value
1529645042,0.0
1529645043,0.0
1529645044,0.0
1529645045,NaN
1529645046,88967.97153024911
1529645047,100200.4008016032
1529645048,122100.12210012211
1529645049,0.0
1529645050,0.0
1529645051,0.0
1529645052,0.0
1529645053,0.0
1529645054,0.0
1529645055,0.0
1529645056,0.0
1529645057,0.0
1529645058,0.0
1529645059,0.0
1529645060,0.0
1529645061,0.0
1529645062,0.0
1529645063,0.0
1529645064,0.0
1529645065,0.0
1529645066,0.0
1529645067,0.0
1529645068,0.0
1529645069,0.0
1529645070,0.0
1529645071,0.0
1529645072,93808.63039399624
1529645073,0.0
1529645074,0.0
1529645075,0.0
1529645076,0.0
1529645077,0.0
1529645078,0.0



--------------------------------------------------
Dhruv Kumar
PhD Candidate
Department of Computer Science and Engineering
University of Minnesota
www.dhruvkumar.me

> On Jun 21, 2018, at 23:07, Jungtaek Lim <kabh...@gmail.com> wrote:
> 
> I'm referring to 2.4.0-SNAPSHOT (not sure which commit I'm referring) but it 
> properly returns the input rate.
> 
> $ tail -F 
> /tmp/spark-trial-metric/local-1529640063554.driver.spark.streaming.counts.inputRate-total.csv
> t,value
> 1529640073,0.0
> 1529640083,0.9411272613196695
> 1529640093,0.9430996541967934
> 1529640103,1.0606060606060606
> 1529640113,0.9997000899730081
> 
> Could you add streaming query listener and see the value of sources -> 
> numInputRows, inputRowsPerSecond, processedRowsPerSecond? They should provide 
> some valid numbers.
> 
> Thanks,
> Jungtaek Lim (HeartSaVioR)
> 
> 2018년 6월 22일 (금) 오전 11:49, Dhruv Kumar <gargdhru...@gmail.com 
> <mailto:gargdhru...@gmail.com>>님이 작성:
> Hi
> 
> I was trying to measure the performance metrics for spark structured 
> streaming. But I am unable to see any data in the metrics log files. My input 
> source is the Rate source 
> <https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#creating-streaming-dataframes-and-streaming-datasets>
>  which generates data at the specified number of rows per second. Here is the 
> link to my code 
> <https://gist.github.com/kudhru/e1ce6b3f399c546be5eeb1f590087992> and 
> metrics.properties 
> <https://gist.github.com/kudhru/5d8a8f4d53c766e9efad4de2ae9b82d6> file.
> 
> When I run the above mentioned code using spark-submit, I see that the 
> metrics logs (for example, 
> run_1.driver.spark.streaming.aggregates.inputRate-total.csv) are created 
> under the specified directory but most of the values are 0. 
> Below is a portion of the inputeRate-total.csv file:
> 1529634585,0.0
> 1529634595,0.0
> 1529634605,0.0
> 1529634615,0.0
> 1529634625,0.0
> 1529634635,0.0
> 1529634645,0.0
> 1529634655,0.0
> 1529634665,0.0
> 1529634675,0.0
> 1529634685,0.0
> 1529634695,0.0
> 1529634705,0.0
> 1529634715,0.0
> 
> Any reason as to why this must be happening? Happy to share more information 
> if that helps.
> 
> Thanks
> --------------------------------------------------
> Dhruv Kumar
> PhD Candidate
> Department of Computer Science and Engineering
> University of Minnesota
> www.dhruvkumar.me <http://www.dhruvkumar.me/>

Reply via email to