[ https://issues.apache.org/jira/browse/SPARK-26086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
vijayant soni updated SPARK-26086: ---------------------------------- Description: We have an Spark Streaming application that reads from Kinesis and writes to Redshift. *Configuration*: Number of receivers = 5 Batch interval = 10 mins spark.streaming.receiver.maxRate = 2000 (records per second) According to this config, the max records that can be read in a single batch can be calculated using below formula: {noformat} Max records per batch = batch_interval * 60 (convert mins to seconds) * 5 (number of receivers) * 2000 (max records per second per receiver) 10 * 60 * 5 * 2000 = 6,000,000 {noformat} But the actual number of records is more that the max number. Batch I - 6,005,886 records Batch II - 6,001,623 records Batch III - 6,010,148 records Please note that receivers are not even reading at the max rate, the records read per receiver are near 1900 per second. was: We have an Spark Streaming application that reads from Kinesis and writes to Redshift. *Configuration*: Number of receivers = 5 Batch interval = 10 mins spark.streaming.receiver.maxRate = 2000 (records per second) According to this config, the max records that can be read in a single batch can be calculated using below formula: {\{Max records per batch = batch_interval * 60 (convert mins to seconds) * 5 (number of receivers) * 2000 (max records per second per receiver) 10 * 60 * 5 * 2000 = 6,000,000 }} But the actual number of records is more that the max number. Batch I - 6,005,886 records Batch II - 6,001,623 records Batch III - 6,010,148 records Please note that receivers are not even reading at the max rate, the records read per receiver are near 1900 per second. > Spark streaming max records per batch interval > ---------------------------------------------- > > Key: SPARK-26086 > URL: https://issues.apache.org/jira/browse/SPARK-26086 > Project: Spark > Issue Type: Bug > Components: DStreams > Affects Versions: 2.3.1 > Reporter: vijayant soni > Priority: Major > > We have an Spark Streaming application that reads from Kinesis and writes to > Redshift. > *Configuration*: > Number of receivers = 5 > Batch interval = 10 mins > spark.streaming.receiver.maxRate = 2000 (records per second) > According to this config, the max records that can be read in a single batch > can be calculated using below formula: > > {noformat} > Max records per batch = batch_interval * 60 (convert mins to seconds) * 5 > (number of receivers) * 2000 (max records per second per receiver) > 10 * 60 * 5 * 2000 = 6,000,000 > {noformat} > > But the actual number of records is more that the max number. > Batch I - 6,005,886 records > Batch II - 6,001,623 records > Batch III - 6,010,148 records > Please note that receivers are not even reading at the max rate, the records > read per receiver are near 1900 per second. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org