[ 
https://issues.apache.org/jira/browse/SPARK-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593171#comment-14593171
 ] 

Dibyendu Bhattacharya commented on SPARK-8474:
----------------------------------------------

https://kafka.apache.org/08/configuration.html

it says fetch.message.max.bytes 

"The number of byes of messages to attempt to fetch for each topic-partition in 
each fetch request. These bytes will be read into memory for each partition, so 
this helps control the memory used by the consumer. The fetch request size must 
be at least as large as the maximum message size the server allows or else it 
is possible for the producer to send messages larger than the consumer can 
fetch."

This is not per messages , but size of message you fetch in every FetchRequest 
using FetchRequestBuilder

> [STREAMING] Kafka DirectStream API stops receiving messages if collective 
> size of the messages specified in spark.streaming.kafka.maxRatePerPartition 
> exceeds the default fetch size ( fetch.message.max.bytes) of SimpleConsumer
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-8474
>                 URL: https://issues.apache.org/jira/browse/SPARK-8474
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>    Affects Versions: 1.4.0
>            Reporter: Dibyendu Bhattacharya
>            Priority: Critical
>
> The issue is , if in Kafka there are variable size messages ranging from few 
> KB to few hundred KBs , setting the rate limiting by number of messages can 
> leads to potential issue.
> Let say size of messages in Kafka are such that for default 
> fetch.message.max.bytes (which is 1 MB ) limit ONLY 1000 messages can be 
> pulled, whereas I specified the spark.streaming.kafka.maxRatePerPartition 
> number as say 2000. Now with this settings when Kafka RDD pulls messages for 
> its offset range , it will only pull 1000 messages (limited by size of the 
> pull in SimpleConsumer API) and can never be able to pull messages till the 
> desired untilOffset and in KafkaRDD it failed in this assert call..
> assert(requestOffset == part.untilOffset, errRanOutBeforeEnd(part))



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to