I agree Gerard. Thanks for pointing this..

Dib

On Thu, Sep 11, 2014 at 5:28 PM, Gerard Maas <gerard.m...@gmail.com> wrote:

> This pattern works.
>
> One note, thought: Use 'union' only if you need to group the data from all
> RDDs into one RDD for processing (like count distinct or need a groupby).
> If your process can be parallelized over every stream of incoming data, I
> suggest you just apply the required transformations on every dstream and
> avoid 'union' altogether.
>
> -kr, Gerard.
>
>
>
> On Wed, Sep 10, 2014 at 8:17 PM, Tim Smith <secs...@gmail.com> wrote:
>
>> How are you creating your kafka streams in Spark?
>>
>> If you have 10 partitions for a topic, you can call "createStream" ten
>> times to create 10 parallel receivers/executors and then use "union" to
>> combine all the dStreams.
>>
>>
>>
>> On Wed, Sep 10, 2014 at 7:16 AM, richiesgr <richie...@gmail.com> wrote:
>>
>>> Hi (my previous post as been used by someone else)
>>>
>>> I'm building a application the read from kafka stream event. In
>>> production
>>> we've 5 consumers that share 10 partitions.
>>> But on spark streaming kafka only 1 worker act as a consumer then
>>> distribute
>>> the tasks to workers so I can have only 1 machine acting as consumer but
>>> I
>>> need more because only 1 consumer means Lags.
>>>
>>> Do you've any idea what I can do ? Another point is interresting the
>>> master
>>> is not loaded at all I can get up more than 10 % CPU
>>>
>>> I've tried to increase the queued.max.message.chunks on the kafka client
>>> to
>>> read more records thinking it'll speed up the read but I only get
>>>
>>> ERROR consumer.ConsumerFetcherThread:
>>>
>>> [ConsumerFetcherThread-SparkEC2_ip-10-138-59-194.ec2.internal-1410182950783-5c49c8e8-0-174167372],
>>> Error in fetch Name: FetchRequest; Version: 0; CorrelationId: 73;
>>> ClientId:
>>>
>>> SparkEC2-ConsumerFetcherThread-SparkEC2_ip-10-138-59-194.ec2.internal-1410182950783-5c49c8e8-0-174167372;
>>> ReplicaId: -1; MaxWait: 100 ms; MinBytes: 1 bytes; RequestInfo: [IA2,7]
>>> ->
>>> PartitionFetchInfo(929838589,1048576),[IA2,6] ->
>>> PartitionFetchInfo(929515796,1048576),[IA2,9] ->
>>> PartitionFetchInfo(929577946,1048576),[IA2,8] ->
>>> PartitionFetchInfo(930751599,1048576),[IA2,2] ->
>>> PartitionFetchInfo(926457704,1048576),[IA2,5] ->
>>> PartitionFetchInfo(930774385,1048576),[IA2,0] ->
>>> PartitionFetchInfo(929913213,1048576),[IA2,3] ->
>>> PartitionFetchInfo(929268891,1048576),[IA2,4] ->
>>> PartitionFetchInfo(929949877,1048576),[IA2,1] ->
>>> PartitionFetchInfo(930063114,1048576)
>>> java.lang.OutOfMemoryError: Java heap space
>>>
>>> Is someone have ideas ?
>>> Thanks
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-scale-more-consumer-to-Kafka-stream-tp13883.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>

Reply via email to