As I understand the matter:

Option 1) has benefits when you think that your network bandwidth may be a
bottle neck, because Spark opens several network connections on possibly
several different physical machines.

Option 2) - as you already pointed out - has the benefit that you occupy
less worker cores with receiver tasks.

Regards,
Jeff

2015-02-26 9:38 GMT+01:00 bit1...@163.com <bit1...@163.com>:

> Sure, Thanks Tathagata!
>
> ------------------------------
> bit1...@163.com
>
>
> *From:* Tathagata Das <t...@databricks.com>
> *Date:* 2015-02-26 14:47
> *To:* bit1...@163.com
> *CC:* Akhil Das <ak...@sigmoidanalytics.com>; user <user@spark.apache.org>
> *Subject:* Re: Re: Many Receiver vs. Many threads per Receiver
> Spark Streaming has a new Kafka direct stream, to be release as
> experimental feature with 1.3. That uses a low level consumer. Not sure if
> it satisfies your purpose.
> If you want more control, its best to create your own Receiver with the
> low level Kafka API.
>
> TD
>
> On Tue, Feb 24, 2015 at 12:09 AM, bit1...@163.com <bit1...@163.com> wrote:
>
>> Thanks Akhil.
>> Not sure whether thelowlevel consumer.
>> <https://github.com/dibbhatt/kafka-spark-consumer>will be officially
>> supported by Spark Streaming. So far, I don't see it mentioned/documented
>> in the spark streaming programming guide.
>>
>> ------------------------------
>> bit1...@163.com
>>
>>
>> *From:* Akhil Das <ak...@sigmoidanalytics.com>
>> *Date:* 2015-02-24 16:21
>> *To:* bit1...@163.com
>> *CC:* user <user@spark.apache.org>
>> *Subject:* Re: Many Receiver vs. Many threads per Receiver
>> I believe when you go with 1, it will distribute the consumer across your
>> cluster (possibly on 6 machines), but still it i don't see a away to tell
>> from which partition it will consume etc. If you are looking to have a
>> consumer where you can specify the partition details and all, then you are
>> better off with the lowlevel consumer.
>> <https://github.com/dibbhatt/kafka-spark-consumer>
>>
>>
>>
>> Thanks
>> Best Regards
>>
>> On Tue, Feb 24, 2015 at 9:36 AM, bit1...@163.com <bit1...@163.com> wrote:
>>
>>> Hi,
>>> I  am experimenting Spark Streaming and Kafka Integration, To read
>>> messages from Kafka in parallel, basically there are two ways
>>> 1. Create many Receivers like (1 to 6).map(_ =>
>>> KakfaUtils.createStream).
>>> 2. Specifiy many threads when calling KakfaUtils.createStream like val
>>> topicMap("myTopic"=>6), this will create one receiver with 6 reading
>>> threads.
>>>
>>> My question is which option is better, sounds option 2 is better is to
>>> me because it saves a lot of cores(one Receiver one core), but I
>>> learned from somewhere else that choice 1 is better, so I would ask and see
>>> how you guys elaborate on this. Thank
>>>
>>> ------------------------------
>>> bit1...@163.com
>>>
>>
>>
>

Reply via email to