[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174507#comment-15174507
 ] 

Cody Koeninger commented on SPARK-12177:
----------------------------------------

Thanks for the example of performance numbers.

The direct stream RDD batch sizes are, by default, "whatever's left in kafka".  
The backpressure and maximum limits on batch sizes are in terms of messages not 
bytes, because that's easy to calculate with on the driver without having read 
messages.  You can tune it for your app pretty straightforwardly, as long as 
you don't have a lot of tiny messages followed by a lot of huge messages in the 
same topic.

Doing kafka offset commits on the executor without user intervention opens up a 
whole other can of worms, I'd prefer to avoid that if at all possible.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --------------------------------------------------
>
>                 Key: SPARK-12177
>                 URL: https://issues.apache.org/jira/browse/SPARK-12177
>             Project: Spark
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 1.6.0
>            Reporter: Nikita Tarasenko
>              Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to