Dibyendu Bhattacharya created SPARK-8474:
--------------------------------------------

             Summary: [STREAMING] Kafka DirectStream API stops receiving 
messages if collective size of the messages specified in 
spark.streaming.kafka.maxRatePerPartition exceeds the default fetch size ( 
fetch.message.max.bytes) of SimpleConsumer
                 Key: SPARK-8474
                 URL: https://issues.apache.org/jira/browse/SPARK-8474
             Project: Spark
          Issue Type: Bug
          Components: Streaming
    Affects Versions: 1.4.0
            Reporter: Dibyendu Bhattacharya
            Priority: Critical


The issue is , if in Kafka there are variable size messages ranging from few KB 
to few hundred KBs , setting the rate limiting by number of messages can leads 
to potential issue. 

let say size of messages in Kafka are such that for default  
fetch.message.max.bytes limit ONLY 1000 messages can be pulled, whereas I 
specified the spark.streaming.kafka.maxRatePerPartition limit as say 2000. Now 
with this settings when Kafka RDD  pulls messages for its offset range , it 
will only pull 1000 messages and can never be able to pull messages till the 
desired untilOffset and in KafkaRDD it failed in this assert call..

assert(requestOffset == part.untilOffset, errRanOutBeforeEnd(part))





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to