[ https://issues.apache.org/jira/browse/SPARK-11152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yongjia Wang updated SPARK-11152: --------------------------------- Priority: Major (was: Minor) > Streaming UI: Input sizes are 0 for makeup batches started from a checkpoint > ----------------------------------------------------------------------------- > > Key: SPARK-11152 > URL: https://issues.apache.org/jira/browse/SPARK-11152 > Project: Spark > Issue Type: Bug > Components: Streaming, Web UI > Reporter: Yongjia Wang > > When a streaming job is resumed from a checkpoint at batch time x, and say > the current time when we resume this streaming job is x+10. In this scenario, > since Spark will schedule the missing batches from x+1 to x+10 without any > metadata, the behavior is to pack up all the backlogged inputs into batch > x+1, then assign any new inputs into x+2 to x+10 immediately without waiting. > This results in tiny batches that capture inputs only during the back to back > scheduling intervals. This behavior is very reasonable. However, the > streaming UI does not show correctly the input sizes for all these makeup > batches - they are all 0 from batch x to x+10. Fixing this would be very > helpful. This happens when I use Kafka direct streaming, I assume this would > happen for all other streaming sources as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org