Re: Re: Many Receiver vs. Many threads per Receiver

2015-02-26 Thread bit1...@163.com
Sure, Thanks Tathagata! 



bit1...@163.com
 
From: Tathagata Das
Date: 2015-02-26 14:47
To: bit1...@163.com
CC: Akhil Das; user
Subject: Re: Re: Many Receiver vs. Many threads per Receiver
Spark Streaming has a new Kafka direct stream, to be release as experimental 
feature with 1.3. That uses a low level consumer. Not sure if it satisfies your 
purpose. 
If you want more control, its best to create your own Receiver with the low 
level Kafka API. 

TD

On Tue, Feb 24, 2015 at 12:09 AM, bit1...@163.com bit1...@163.com wrote:
Thanks Akhil.
Not sure whether thelowlevel consumer.will be officially supported by Spark 
Streaming. So far, I don't see it mentioned/documented in the spark streaming 
programming guide.



bit1...@163.com
 
From: Akhil Das
Date: 2015-02-24 16:21
To: bit1...@163.com
CC: user
Subject: Re: Many Receiver vs. Many threads per Receiver
I believe when you go with 1, it will distribute the consumer across your 
cluster (possibly on 6 machines), but still it i don't see a away to tell from 
which partition it will consume etc. If you are looking to have a consumer 
where you can specify the partition details and all, then you are better off 
with the lowlevel consumer.



Thanks
Best Regards

On Tue, Feb 24, 2015 at 9:36 AM, bit1...@163.com bit1...@163.com wrote:
Hi,
I  am experimenting Spark Streaming and Kafka Integration, To read messages 
from Kafka in parallel, basically there are two ways
1. Create many Receivers like (1 to 6).map(_ = KakfaUtils.createStream). 
2. Specifiy many threads when calling KakfaUtils.createStream like val 
topicMap(myTopic=6), this will create one receiver with 6 reading threads.

My question is which option is better, sounds option 2 is better is to me 
because it saves a lot of cores(one Receiver one core), but I learned from 
somewhere else that choice 1 is better, so I would ask and see how you guys 
elaborate on this. Thank



bit1...@163.com




Re: Re: Many Receiver vs. Many threads per Receiver

2015-02-26 Thread Jeffrey Jedele
As I understand the matter:

Option 1) has benefits when you think that your network bandwidth may be a
bottle neck, because Spark opens several network connections on possibly
several different physical machines.

Option 2) - as you already pointed out - has the benefit that you occupy
less worker cores with receiver tasks.

Regards,
Jeff

2015-02-26 9:38 GMT+01:00 bit1...@163.com bit1...@163.com:

 Sure, Thanks Tathagata!

 --
 bit1...@163.com


 *From:* Tathagata Das t...@databricks.com
 *Date:* 2015-02-26 14:47
 *To:* bit1...@163.com
 *CC:* Akhil Das ak...@sigmoidanalytics.com; user user@spark.apache.org
 *Subject:* Re: Re: Many Receiver vs. Many threads per Receiver
 Spark Streaming has a new Kafka direct stream, to be release as
 experimental feature with 1.3. That uses a low level consumer. Not sure if
 it satisfies your purpose.
 If you want more control, its best to create your own Receiver with the
 low level Kafka API.

 TD

 On Tue, Feb 24, 2015 at 12:09 AM, bit1...@163.com bit1...@163.com wrote:

 Thanks Akhil.
 Not sure whether thelowlevel consumer.
 https://github.com/dibbhatt/kafka-spark-consumerwill be officially
 supported by Spark Streaming. So far, I don't see it mentioned/documented
 in the spark streaming programming guide.

 --
 bit1...@163.com


 *From:* Akhil Das ak...@sigmoidanalytics.com
 *Date:* 2015-02-24 16:21
 *To:* bit1...@163.com
 *CC:* user user@spark.apache.org
 *Subject:* Re: Many Receiver vs. Many threads per Receiver
 I believe when you go with 1, it will distribute the consumer across your
 cluster (possibly on 6 machines), but still it i don't see a away to tell
 from which partition it will consume etc. If you are looking to have a
 consumer where you can specify the partition details and all, then you are
 better off with the lowlevel consumer.
 https://github.com/dibbhatt/kafka-spark-consumer



 Thanks
 Best Regards

 On Tue, Feb 24, 2015 at 9:36 AM, bit1...@163.com bit1...@163.com wrote:

 Hi,
 I  am experimenting Spark Streaming and Kafka Integration, To read
 messages from Kafka in parallel, basically there are two ways
 1. Create many Receivers like (1 to 6).map(_ =
 KakfaUtils.createStream).
 2. Specifiy many threads when calling KakfaUtils.createStream like val
 topicMap(myTopic=6), this will create one receiver with 6 reading
 threads.

 My question is which option is better, sounds option 2 is better is to
 me because it saves a lot of cores(one Receiver one core), but I
 learned from somewhere else that choice 1 is better, so I would ask and see
 how you guys elaborate on this. Thank

 --
 bit1...@163.com






Re: Re: Many Receiver vs. Many threads per Receiver

2015-02-25 Thread Tathagata Das
Spark Streaming has a new Kafka direct stream, to be release as
experimental feature with 1.3. That uses a low level consumer. Not sure if
it satisfies your purpose.
If you want more control, its best to create your own Receiver with the low
level Kafka API.

TD

On Tue, Feb 24, 2015 at 12:09 AM, bit1...@163.com bit1...@163.com wrote:

 Thanks Akhil.
 Not sure whether thelowlevel consumer.
 https://github.com/dibbhatt/kafka-spark-consumerwill be officially
 supported by Spark Streaming. So far, I don't see it mentioned/documented
 in the spark streaming programming guide.

 --
 bit1...@163.com


 *From:* Akhil Das ak...@sigmoidanalytics.com
 *Date:* 2015-02-24 16:21
 *To:* bit1...@163.com
 *CC:* user user@spark.apache.org
 *Subject:* Re: Many Receiver vs. Many threads per Receiver
 I believe when you go with 1, it will distribute the consumer across your
 cluster (possibly on 6 machines), but still it i don't see a away to tell
 from which partition it will consume etc. If you are looking to have a
 consumer where you can specify the partition details and all, then you are
 better off with the lowlevel consumer.
 https://github.com/dibbhatt/kafka-spark-consumer



 Thanks
 Best Regards

 On Tue, Feb 24, 2015 at 9:36 AM, bit1...@163.com bit1...@163.com wrote:

 Hi,
 I  am experimenting Spark Streaming and Kafka Integration, To read
 messages from Kafka in parallel, basically there are two ways
 1. Create many Receivers like (1 to 6).map(_ = KakfaUtils.createStream).
 2. Specifiy many threads when calling KakfaUtils.createStream like val
 topicMap(myTopic=6), this will create one receiver with 6 reading
 threads.

 My question is which option is better, sounds option 2 is better is to me
 because it saves a lot of cores(one Receiver one core), but I learned
 from somewhere else that choice 1 is better, so I would ask and see how you
 guys elaborate on this. Thank

 --
 bit1...@163.com





Re: Re: Many Receiver vs. Many threads per Receiver

2015-02-24 Thread bit1...@163.com
Thanks Akhil.
Not sure whether thelowlevel consumer.will be officially supported by Spark 
Streaming. So far, I don't see it mentioned/documented in the spark streaming 
programming guide.



bit1...@163.com
 
From: Akhil Das
Date: 2015-02-24 16:21
To: bit1...@163.com
CC: user
Subject: Re: Many Receiver vs. Many threads per Receiver
I believe when you go with 1, it will distribute the consumer across your 
cluster (possibly on 6 machines), but still it i don't see a away to tell from 
which partition it will consume etc. If you are looking to have a consumer 
where you can specify the partition details and all, then you are better off 
with the lowlevel consumer.



Thanks
Best Regards

On Tue, Feb 24, 2015 at 9:36 AM, bit1...@163.com bit1...@163.com wrote:
Hi,
I  am experimenting Spark Streaming and Kafka Integration, To read messages 
from Kafka in parallel, basically there are two ways
1. Create many Receivers like (1 to 6).map(_ = KakfaUtils.createStream). 
2. Specifiy many threads when calling KakfaUtils.createStream like val 
topicMap(myTopic=6), this will create one receiver with 6 reading threads.

My question is which option is better, sounds option 2 is better is to me 
because it saves a lot of cores(one Receiver one core), but I learned from 
somewhere else that choice 1 is better, so I would ask and see how you guys 
elaborate on this. Thank



bit1...@163.com