[ https://issues.apache.org/jira/browse/SPARK-18371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-18371: ------------------------------------ Assignee: (was: Apache Spark) > Spark Streaming backpressure bug - generates a batch with large number of > records > --------------------------------------------------------------------------------- > > Key: SPARK-18371 > URL: https://issues.apache.org/jira/browse/SPARK-18371 > Project: Spark > Issue Type: Bug > Components: DStreams > Affects Versions: 2.0.0 > Reporter: mapreduced > Attachments: GiantBatch2.png, GiantBatch3.png, > Giant_batch_at_23_00.png, Look_at_batch_at_22_14.png > > > When the streaming job is configured with backpressureEnabled=true, it > generates a GIANT batch of records if the processing time + scheduled delay > is (much) larger than batchDuration. This creates a backlog of records like > no other and results in batches queueing for hours until it chews through > this giant batch. > Expectation is that it should reduce the number of records per batch in some > time to whatever it can really process. > Attaching some screen shots where it seems that this issue is quite easily > reproducible. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org