[ 
https://issues.apache.org/jira/browse/SPARK-18371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16940405#comment-16940405
 ] 

Sebastian Arzt commented on SPARK-18371:
----------------------------------------

[~rkarthikeyan] at a first glace I cannot find back pressure support in the 
kinesis receiver yet. I think your problem should be investigated 
independently. I suggest to create a new ticket with instructions to reproduce 
your findings.

> Spark Streaming backpressure bug - generates a batch with large number of 
> records
> ---------------------------------------------------------------------------------
>
>                 Key: SPARK-18371
>                 URL: https://issues.apache.org/jira/browse/SPARK-18371
>             Project: Spark
>          Issue Type: Bug
>          Components: DStreams
>    Affects Versions: 2.0.0
>            Reporter: mapreduced
>            Assignee: Sebastian Arzt
>            Priority: Major
>             Fix For: 2.4.0
>
>         Attachments: 01.png, 02.png, GiantBatch2.png, GiantBatch3.png, 
> Giant_batch_at_23_00.png, Look_at_batch_at_22_14.png, Screen Shot 2019-09-16 
> at 12.27.25 PM.png
>
>
> When the streaming job is configured with backpressureEnabled=true, it 
> generates a GIANT batch of records if the processing time + scheduled delay 
> is (much) larger than batchDuration. This creates a backlog of records like 
> no other and results in batches queueing for hours until it chews through 
> this giant batch.
> Expectation is that it should reduce the number of records per batch in some 
> time to whatever it can really process.
> Attaching some screen shots where it seems that this issue is quite easily 
> reproducible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to