Jungtaek Lim created SPARK-29314: ------------------------------------ Summary: ProgressReporter.extractStateOperatorMetrics should not overwrite updated as 0 when it actually runs a batch even with no data Key: SPARK-29314 URL: https://issues.apache.org/jira/browse/SPARK-29314 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 2.4.4, 3.0.0 Reporter: Jungtaek Lim
SPARK-24156 brought the ability to run a batch without actual data to enable fast state cleanup as well as emit evicted outputs without waiting actual data to come. This breaks some assumption on `ProgressReporter.extractStateOperatorMetrics`. See comment in source code: {code:java} // lastExecution could belong to one of the previous triggers if `!hasNewData`. // Walking the plan again should be inexpensive. {code} and newNumRowsUpdated is replaced to 0 if hasNewData is false. It makes sense if we copy progress from previous execution (which means no batch is run for this time), but after SPARK-24156 the precondition is broken. Spark should still replace the value of newNumRowsUpdated with 0 if there's no batch being run and it needs to copy the old value from previous execution, but it shouldn't touch the value if it runs a batch for no data. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org