Re: [spark-streaming] New directStream API reads topic's partitions sequentially. Why?

Понькин Алексей Sat, 05 Sep 2015 01:06:23 -0700

Hi Cody,
Thank you for quick response.
The problem was that my application did not have enough resources(all executors 
were busy). So spark decided to run these tasks sequentially. When I add more 
executors for application everything goes fine.
Thank you anyway.
P.S. BTW thanks you for great video lecture about directStream 
https://youtu.be/fXnNEq1v3VA.


-- 
Яндекс.Почта — надёжная почта
http://mail.yandex.ru/neo2/collect/?exp=1&t=1


04.09.2015, 17:03, "Cody Koeninger" <c...@koeninger.org>:
> The direct stream just makes a spark partition per kafka partition, so if 
> those partitions are not getting evenly distributed among executors, 
> something else is probably wrong with your configuration.
>
> If you replace the kafka stream with a dummy rdd created with e.g. 
> sc.parallelize, what happens?
>
> Also, are you running kafka on one of the yarn executors, or on a different 
> machine?
>
> On Fri, Sep 4, 2015 at 5:17 AM, ponkin <alexey.pon...@ya.ru> wrote:
>> Hi,
>> I am trying to read kafka topic with new directStream method in KafkaUtils.
>> I have Kafka topic with 8 partitions.
>> I am running streaming job on yarn with 8 execuors with 1 core  for each
>> one.
>> So noticed that spark reads all topic's partitions in one executor
>> sequentially - this is obviously not what I want.
>> I want spark to read all partitions in parallel.
>> How can I achieve that?
>>
>> Thank you, in advance.
>>
>> --
>> View this message in context: 
>> http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-New-directStream-API-reads-topic-s-partitions-sequentially-Why-tp24577.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: [spark-streaming] New directStream API reads topic's partitions sequentially. Why?

Reply via email to