Hi Can some one please take a look at below? Any help is deeply appreciated.
-------------------------------------------------- Dhruv Kumar PhD Candidate Department of Computer Science and Engineering University of Minnesota www.dhruvkumar.me > On Jun 22, 2018, at 13:12, Dhruv Kumar <gargdhru...@gmail.com> wrote: > > I actually tried the File Source (reading CSV files as stream and processing > them). File source seems to be generating valid numbers in the metrics log > files. I may be wrong but seems like an issue with the Rate source generating > metrics in the metrics log files. > > -------------------------------------------------- > Dhruv Kumar > PhD Candidate > Department of Computer Science and Engineering > University of Minnesota > www.dhruvkumar.me <http://www.dhruvkumar.me/> > >> On Jun 22, 2018, at 00:35, Dhruv Kumar <gargdhru...@gmail.com >> <mailto:gargdhru...@gmail.com>> wrote: >> >> Thanks a lot for your mail Jungtaek. I added the StreamingQueryListener into >> my code (updated code >> <https://gist.github.com/kudhru/e1ce6b3f399c546be5eeb1f590087992>) and was >> able to see valid inputRowsPerSecond, processRowsPerSecond numbers. But it >> also shows zeros intermittently. Here is the sample output >> <https://gist.github.com/kudhru/db2bced789c528464620ae1767597127> Could you >> explain why is this the case? >> Unfortunately, the csv files still show zeros only except few non-zeros. Do >> you know why this may be happening? (I changed the metrics.properties to >> print every second instead of every 10 seconds). >> Here is the output of the metrics log file >> (run_latest.driver.spark.streaming.aggregates.inputRate-total.csv) >> t,value >> 1529645042,0.0 >> 1529645043,0.0 >> 1529645044,0.0 >> 1529645045,NaN >> 1529645046,88967.97153024911 >> 1529645047,100200.4008016032 >> 1529645048,122100.12210012211 >> 1529645049,0.0 >> 1529645050,0.0 >> 1529645051,0.0 >> 1529645052,0.0 >> 1529645053,0.0 >> 1529645054,0.0 >> 1529645055,0.0 >> 1529645056,0.0 >> 1529645057,0.0 >> 1529645058,0.0 >> 1529645059,0.0 >> 1529645060,0.0 >> 1529645061,0.0 >> 1529645062,0.0 >> 1529645063,0.0 >> 1529645064,0.0 >> 1529645065,0.0 >> 1529645066,0.0 >> 1529645067,0.0 >> 1529645068,0.0 >> 1529645069,0.0 >> 1529645070,0.0 >> 1529645071,0.0 >> 1529645072,93808.63039399624 >> 1529645073,0.0 >> 1529645074,0.0 >> 1529645075,0.0 >> 1529645076,0.0 >> 1529645077,0.0 >> 1529645078,0.0 >> >> >> >> -------------------------------------------------- >> Dhruv Kumar >> PhD Candidate >> Department of Computer Science and Engineering >> University of Minnesota >> www.dhruvkumar.me <http://www.dhruvkumar.me/> >> >>> On Jun 21, 2018, at 23:07, Jungtaek Lim <kabh...@gmail.com >>> <mailto:kabh...@gmail.com>> wrote: >>> >>> I'm referring to 2.4.0-SNAPSHOT (not sure which commit I'm referring) but >>> it properly returns the input rate. >>> >>> $ tail -F >>> /tmp/spark-trial-metric/local-1529640063554.driver.spark.streaming.counts.inputRate-total.csv >>> t,value >>> 1529640073,0.0 >>> 1529640083,0.9411272613196695 >>> 1529640093,0.9430996541967934 >>> 1529640103,1.0606060606060606 >>> 1529640113,0.9997000899730081 >>> >>> Could you add streaming query listener and see the value of sources -> >>> numInputRows, inputRowsPerSecond, processedRowsPerSecond? They should >>> provide some valid numbers. >>> >>> Thanks, >>> Jungtaek Lim (HeartSaVioR) >>> >>> 2018년 6월 22일 (금) 오전 11:49, Dhruv Kumar <gargdhru...@gmail.com >>> <mailto:gargdhru...@gmail.com>>님이 작성: >>> Hi >>> >>> I was trying to measure the performance metrics for spark structured >>> streaming. But I am unable to see any data in the metrics log files. My >>> input source is the Rate source >>> <https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#creating-streaming-dataframes-and-streaming-datasets> >>> which generates data at the specified number of rows per second. Here is >>> the link to my code >>> <https://gist.github.com/kudhru/e1ce6b3f399c546be5eeb1f590087992> and >>> metrics.properties >>> <https://gist.github.com/kudhru/5d8a8f4d53c766e9efad4de2ae9b82d6> file. >>> >>> When I run the above mentioned code using spark-submit, I see that the >>> metrics logs (for example, >>> run_1.driver.spark.streaming.aggregates.inputRate-total.csv) are created >>> under the specified directory but most of the values are 0. >>> Below is a portion of the inputeRate-total.csv file: >>> 1529634585,0.0 >>> 1529634595,0.0 >>> 1529634605,0.0 >>> 1529634615,0.0 >>> 1529634625,0.0 >>> 1529634635,0.0 >>> 1529634645,0.0 >>> 1529634655,0.0 >>> 1529634665,0.0 >>> 1529634675,0.0 >>> 1529634685,0.0 >>> 1529634695,0.0 >>> 1529634705,0.0 >>> 1529634715,0.0 >>> >>> Any reason as to why this must be happening? Happy to share more >>> information if that helps. >>> >>> Thanks >>> -------------------------------------------------- >>> Dhruv Kumar >>> PhD Candidate >>> Department of Computer Science and Engineering >>> University of Minnesota >>> www.dhruvkumar.me <http://www.dhruvkumar.me/> >> >