[jira] [Resolved] (SPARK-18371) Spark Streaming backpressure bug - generates a batch with large number of records

Cody Koeninger (JIRA) Fri, 16 Mar 2018 10:34:18 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-18371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Cody Koeninger resolved SPARK-18371.
------------------------------------
       Resolution: Fixed
    Fix Version/s: 2.4.0

> Spark Streaming backpressure bug - generates a batch with large number of 
> records
> ---------------------------------------------------------------------------------
>
>                 Key: SPARK-18371
>                 URL: https://issues.apache.org/jira/browse/SPARK-18371
>             Project: Spark
>          Issue Type: Bug
>          Components: DStreams
>    Affects Versions: 2.0.0
>            Reporter: mapreduced
>            Assignee: Sebastian Arzt
>            Priority: Major
>             Fix For: 2.4.0
>
>         Attachments: 01.png, 02.png, GiantBatch2.png, GiantBatch3.png, 
> Giant_batch_at_23_00.png, Look_at_batch_at_22_14.png
>
>
> When the streaming job is configured with backpressureEnabled=true, it 
> generates a GIANT batch of records if the processing time + scheduled delay 
> is (much) larger than batchDuration. This creates a backlog of records like 
> no other and results in batches queueing for hours until it chews through 
> this giant batch.
> Expectation is that it should reduce the number of records per batch in some 
> time to whatever it can really process.
> Attaching some screen shots where it seems that this issue is quite easily 
> reproducible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-18371) Spark Streaming backpressure bug - generates a batch with large number of records

Reply via email to