vkorukanti opened a new pull request #33013:
URL: https://github.com/apache/spark/pull/33013


   
   ### What changes were proposed in this pull request?
   
   Fix how we measure the metric `allUpdatesTimeMs` in 
`FlatMapGroupsWithStateExec` similar to other streaming stateful operators.
   
   ### Why are the changes needed?
   
   Metric `allUpdatesTimeMs` meant to capture the start to end walltime of the 
operator `FlatMapGroupsWithStateExec`, but currently it just 
[captures](https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala#L121)
 the iterator creation time.
   
   Fix it to measure similar to how other stateful operators measure. Example 
one 
[here](https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala#L406).
 This measurement is not perfect due to the nature of the lazy iterator and 
also includes the time the consumer operator spent in processing the current 
operator output, but it should give a good signal when comparing the metric in 
one microbatch to the metric in another microbatch.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Existing UTs for regression. Due to the nature of metric type (time), it is 
hard to write a UT, but have manually verified.
   
   Closes #32952 from vkorukanti/SPARK-35799.
   
   Authored-by: Venki Korukanti <venki.koruka...@gmail.com>
   Signed-off-by: Jungtaek Lim <kabhwan.opensou...@gmail.com>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to