[ https://issues.apache.org/jira/browse/SPARK-18371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16940405#comment-16940405 ]
Sebastian Arzt commented on SPARK-18371: ---------------------------------------- [~rkarthikeyan] at a first glace I cannot find back pressure support in the kinesis receiver yet. I think your problem should be investigated independently. I suggest to create a new ticket with instructions to reproduce your findings. > Spark Streaming backpressure bug - generates a batch with large number of > records > --------------------------------------------------------------------------------- > > Key: SPARK-18371 > URL: https://issues.apache.org/jira/browse/SPARK-18371 > Project: Spark > Issue Type: Bug > Components: DStreams > Affects Versions: 2.0.0 > Reporter: mapreduced > Assignee: Sebastian Arzt > Priority: Major > Fix For: 2.4.0 > > Attachments: 01.png, 02.png, GiantBatch2.png, GiantBatch3.png, > Giant_batch_at_23_00.png, Look_at_batch_at_22_14.png, Screen Shot 2019-09-16 > at 12.27.25 PM.png > > > When the streaming job is configured with backpressureEnabled=true, it > generates a GIANT batch of records if the processing time + scheduled delay > is (much) larger than batchDuration. This creates a backlog of records like > no other and results in batches queueing for hours until it chews through > this giant batch. > Expectation is that it should reduce the number of records per batch in some > time to whatever it can really process. > Attaching some screen shots where it seems that this issue is quite easily > reproducible. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org