[ https://issues.apache.org/jira/browse/SPARK-12203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048879#comment-15048879 ]
Cody Koeninger commented on SPARK-12203: ---------------------------------------- Commented on the PR. I don't think this makes sense for inclusion in spark, at least in its current state. I think efforts towards minimizing latency of the direct stream (assuming that just tuning your batch sizes smaller isn't sufficient) would be better spent pursuing pre-fetching / caching on the executors... but that's a noticeable increase in complexity. > Add KafkaDirectInputDStream that directly pulls messages from Kafka Brokers > using receivers > ------------------------------------------------------------------------------------------- > > Key: SPARK-12203 > URL: https://issues.apache.org/jira/browse/SPARK-12203 > Project: Spark > Issue Type: New Feature > Components: Streaming > Reporter: Liang-Chi Hsieh > > Currently, we have DirectKafkaInputDStream, which directly pulls messages > from Kafka Brokers without any receivers, and KafkaInputDStream, which pulls > messages from a Kafka Broker using receiver with zookeeper. > As we observed, because DirectKafkaInputDStream retrieves messages from Kafka > after each batch finishes, it posts a latency compared with KafkaInputDStream > that continues to pull messages during each batch window. > So we try to add KafkaDirectInputDStream that directly pulls messages from > Kafka Brokers as DirectKafkaInputDStream, but it uses receivers as > KafkaInputDStream and pulls messages during each batch window. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org