Re: Re: Many Receiver vs. Many threads per Receiver
Sure, Thanks Tathagata! bit1...@163.com From: Tathagata Das Date: 2015-02-26 14:47 To: bit1...@163.com CC: Akhil Das; user Subject: Re: Re: Many Receiver vs. Many threads per Receiver Spark Streaming has a new Kafka direct stream, to be release as experimental feature with 1.3. That uses a low level consumer. Not sure if it satisfies your purpose. If you want more control, its best to create your own Receiver with the low level Kafka API. TD On Tue, Feb 24, 2015 at 12:09 AM, bit1...@163.com bit1...@163.com wrote: Thanks Akhil. Not sure whether thelowlevel consumer.will be officially supported by Spark Streaming. So far, I don't see it mentioned/documented in the spark streaming programming guide. bit1...@163.com From: Akhil Das Date: 2015-02-24 16:21 To: bit1...@163.com CC: user Subject: Re: Many Receiver vs. Many threads per Receiver I believe when you go with 1, it will distribute the consumer across your cluster (possibly on 6 machines), but still it i don't see a away to tell from which partition it will consume etc. If you are looking to have a consumer where you can specify the partition details and all, then you are better off with the lowlevel consumer. Thanks Best Regards On Tue, Feb 24, 2015 at 9:36 AM, bit1...@163.com bit1...@163.com wrote: Hi, I am experimenting Spark Streaming and Kafka Integration, To read messages from Kafka in parallel, basically there are two ways 1. Create many Receivers like (1 to 6).map(_ = KakfaUtils.createStream). 2. Specifiy many threads when calling KakfaUtils.createStream like val topicMap(myTopic=6), this will create one receiver with 6 reading threads. My question is which option is better, sounds option 2 is better is to me because it saves a lot of cores(one Receiver one core), but I learned from somewhere else that choice 1 is better, so I would ask and see how you guys elaborate on this. Thank bit1...@163.com
Re: Re: Many Receiver vs. Many threads per Receiver
As I understand the matter: Option 1) has benefits when you think that your network bandwidth may be a bottle neck, because Spark opens several network connections on possibly several different physical machines. Option 2) - as you already pointed out - has the benefit that you occupy less worker cores with receiver tasks. Regards, Jeff 2015-02-26 9:38 GMT+01:00 bit1...@163.com bit1...@163.com: Sure, Thanks Tathagata! -- bit1...@163.com *From:* Tathagata Das t...@databricks.com *Date:* 2015-02-26 14:47 *To:* bit1...@163.com *CC:* Akhil Das ak...@sigmoidanalytics.com; user user@spark.apache.org *Subject:* Re: Re: Many Receiver vs. Many threads per Receiver Spark Streaming has a new Kafka direct stream, to be release as experimental feature with 1.3. That uses a low level consumer. Not sure if it satisfies your purpose. If you want more control, its best to create your own Receiver with the low level Kafka API. TD On Tue, Feb 24, 2015 at 12:09 AM, bit1...@163.com bit1...@163.com wrote: Thanks Akhil. Not sure whether thelowlevel consumer. https://github.com/dibbhatt/kafka-spark-consumerwill be officially supported by Spark Streaming. So far, I don't see it mentioned/documented in the spark streaming programming guide. -- bit1...@163.com *From:* Akhil Das ak...@sigmoidanalytics.com *Date:* 2015-02-24 16:21 *To:* bit1...@163.com *CC:* user user@spark.apache.org *Subject:* Re: Many Receiver vs. Many threads per Receiver I believe when you go with 1, it will distribute the consumer across your cluster (possibly on 6 machines), but still it i don't see a away to tell from which partition it will consume etc. If you are looking to have a consumer where you can specify the partition details and all, then you are better off with the lowlevel consumer. https://github.com/dibbhatt/kafka-spark-consumer Thanks Best Regards On Tue, Feb 24, 2015 at 9:36 AM, bit1...@163.com bit1...@163.com wrote: Hi, I am experimenting Spark Streaming and Kafka Integration, To read messages from Kafka in parallel, basically there are two ways 1. Create many Receivers like (1 to 6).map(_ = KakfaUtils.createStream). 2. Specifiy many threads when calling KakfaUtils.createStream like val topicMap(myTopic=6), this will create one receiver with 6 reading threads. My question is which option is better, sounds option 2 is better is to me because it saves a lot of cores(one Receiver one core), but I learned from somewhere else that choice 1 is better, so I would ask and see how you guys elaborate on this. Thank -- bit1...@163.com
Re: Re: Many Receiver vs. Many threads per Receiver
Spark Streaming has a new Kafka direct stream, to be release as experimental feature with 1.3. That uses a low level consumer. Not sure if it satisfies your purpose. If you want more control, its best to create your own Receiver with the low level Kafka API. TD On Tue, Feb 24, 2015 at 12:09 AM, bit1...@163.com bit1...@163.com wrote: Thanks Akhil. Not sure whether thelowlevel consumer. https://github.com/dibbhatt/kafka-spark-consumerwill be officially supported by Spark Streaming. So far, I don't see it mentioned/documented in the spark streaming programming guide. -- bit1...@163.com *From:* Akhil Das ak...@sigmoidanalytics.com *Date:* 2015-02-24 16:21 *To:* bit1...@163.com *CC:* user user@spark.apache.org *Subject:* Re: Many Receiver vs. Many threads per Receiver I believe when you go with 1, it will distribute the consumer across your cluster (possibly on 6 machines), but still it i don't see a away to tell from which partition it will consume etc. If you are looking to have a consumer where you can specify the partition details and all, then you are better off with the lowlevel consumer. https://github.com/dibbhatt/kafka-spark-consumer Thanks Best Regards On Tue, Feb 24, 2015 at 9:36 AM, bit1...@163.com bit1...@163.com wrote: Hi, I am experimenting Spark Streaming and Kafka Integration, To read messages from Kafka in parallel, basically there are two ways 1. Create many Receivers like (1 to 6).map(_ = KakfaUtils.createStream). 2. Specifiy many threads when calling KakfaUtils.createStream like val topicMap(myTopic=6), this will create one receiver with 6 reading threads. My question is which option is better, sounds option 2 is better is to me because it saves a lot of cores(one Receiver one core), but I learned from somewhere else that choice 1 is better, so I would ask and see how you guys elaborate on this. Thank -- bit1...@163.com
Re: Re: Many Receiver vs. Many threads per Receiver
Thanks Akhil. Not sure whether thelowlevel consumer.will be officially supported by Spark Streaming. So far, I don't see it mentioned/documented in the spark streaming programming guide. bit1...@163.com From: Akhil Das Date: 2015-02-24 16:21 To: bit1...@163.com CC: user Subject: Re: Many Receiver vs. Many threads per Receiver I believe when you go with 1, it will distribute the consumer across your cluster (possibly on 6 machines), but still it i don't see a away to tell from which partition it will consume etc. If you are looking to have a consumer where you can specify the partition details and all, then you are better off with the lowlevel consumer. Thanks Best Regards On Tue, Feb 24, 2015 at 9:36 AM, bit1...@163.com bit1...@163.com wrote: Hi, I am experimenting Spark Streaming and Kafka Integration, To read messages from Kafka in parallel, basically there are two ways 1. Create many Receivers like (1 to 6).map(_ = KakfaUtils.createStream). 2. Specifiy many threads when calling KakfaUtils.createStream like val topicMap(myTopic=6), this will create one receiver with 6 reading threads. My question is which option is better, sounds option 2 is better is to me because it saves a lot of cores(one Receiver one core), but I learned from somewhere else that choice 1 is better, so I would ask and see how you guys elaborate on this. Thank bit1...@163.com