[ 
https://issues.apache.org/jira/browse/SPARK-11152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjia Wang updated SPARK-11152:
---------------------------------
    Description: When a streaming job is resumed from a checkpoint at batch 
time x, and say the current time when we resume this streaming job is x+10. In 
this scenario, since Spark will schedule the missing batches from x+1 to x+10 
without any metadata, the behavior is to pack up all the backlogged inputs into 
batch x+1, then assign any new inputs into x+2 to x+10 immediately without 
waiting. This results in tiny batches that capture inputs only during the back 
to back scheduling intervals. This behavior is very reasonable. However, the 
streaming UI does not show correctly the input sizes for all these makeup 
batches - they are all 0 from batch x to x+10. Fixing this would be very 
helpful. This happens when I use Kafka direct streaming, I assume this would 
happen for all other streaming sources as well.  (was: When a streaming job 
starts from a checkpoint at batch time x, and say the current time when we 
resume this streaming job is x+10. In this scenario, since Spark will schedule 
the missing batches from x+1 to x+10 without any metadata, the behavior is to 
pack up all the backlogged inputs into batch x+1, then assign any new inputs 
into x+2 to x+10 immediately without waiting. This results in tiny batches that 
capture inputs only during the back to back scheduling intervals. This behavior 
is very reasonable. However, the streaming UI does not show correctly the input 
sizes for all these makeup batches - they are all 0 from batch x to x+10. 
Fixing this would be very helpful. This happens when I use Kafka direct 
streaming, I assume this would happen for all other streaming sources as well.)

> Streaming UI: Input sizes are 0 for makeup batches started from a checkpoint 
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-11152
>                 URL: https://issues.apache.org/jira/browse/SPARK-11152
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming, Web UI
>            Reporter: Yongjia Wang
>            Priority: Minor
>
> When a streaming job is resumed from a checkpoint at batch time x, and say 
> the current time when we resume this streaming job is x+10. In this scenario, 
> since Spark will schedule the missing batches from x+1 to x+10 without any 
> metadata, the behavior is to pack up all the backlogged inputs into batch 
> x+1, then assign any new inputs into x+2 to x+10 immediately without waiting. 
> This results in tiny batches that capture inputs only during the back to back 
> scheduling intervals. This behavior is very reasonable. However, the 
> streaming UI does not show correctly the input sizes for all these makeup 
> batches - they are all 0 from batch x to x+10. Fixing this would be very 
> helpful. This happens when I use Kafka direct streaming, I assume this would 
> happen for all other streaming sources as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to