[jira] [Assigned] (SPARK-18371) Spark Streaming backpressure bug - generates a batch with large number of records

Apache Spark (JIRA) Wed, 26 Apr 2017 08:09:45 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-18371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Apache Spark reassigned SPARK-18371:
------------------------------------

    Assignee:     (was: Apache Spark)

> Spark Streaming backpressure bug - generates a batch with large number of 
> records
> ---------------------------------------------------------------------------------
>
>                 Key: SPARK-18371
>                 URL: https://issues.apache.org/jira/browse/SPARK-18371
>             Project: Spark
>          Issue Type: Bug
>          Components: DStreams
>    Affects Versions: 2.0.0
>            Reporter: mapreduced
>         Attachments: GiantBatch2.png, GiantBatch3.png, 
> Giant_batch_at_23_00.png, Look_at_batch_at_22_14.png
>
>
> When the streaming job is configured with backpressureEnabled=true, it 
> generates a GIANT batch of records if the processing time + scheduled delay 
> is (much) larger than batchDuration. This creates a backlog of records like 
> no other and results in batches queueing for hours until it chews through 
> this giant batch.
> Expectation is that it should reduce the number of records per batch in some 
> time to whatever it can really process.
> Attaching some screen shots where it seems that this issue is quite easily 
> reproducible.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18371) Spark Streaming backpressure bug - generates a batch with large number of records

Reply via email to