Re: How to run multiple instances of the same job

Telles Nobrega Sat, 16 Aug 2014 22:42:06 -0700

I was able to fix this problem, now I’m having another one. I’m using a script 
that starts kafka, deploys samza jobs, stop them, kills kafka and delete 
configurations in zookeeper and kafka-log files. Them start over again. I see 
that sometimes jobs don’t start running, they stay in accepted state with info 
LOCALIZING, what can be the cause for that?


Thanks.
On 15 Aug 2014, at 19:18, Chris Riccomini <[email protected]> 
wrote:

> Hey Telles,
> 
> If you set yarn.container.count to 5, you should get 5 containers. The two
> cases where you don't are:
> 
> 1. The grid is at capacity, and doesn't have the memory to fulfill all
> container requests.
> 2. You set yarn.container.count higher than the number of partitions that
> your input stream has.
> 
> Cheers,
> Chris
> 
> On 8/15/14 1:56 PM, "Telles Nobrega" <[email protected]> wrote:
> 
>> Hi Chris,
>> 
>> I started playing with the yarn.container.count and set it to 5.
>> 
>> At first I thought I had to compile the package again and republish to
>> hdfs
>> because I couldn't run 5 containers.
>> Then I recompiled but I still only got 3 containers, is that normal
>> behaviour?
>> 
>> Thanks.
>> 
>> 
>> On Wed, Aug 13, 2014 at 5:00 PM, Telles Nobrega <[email protected]>
>> wrote:
>> 
>>> Thanks Chris, i will take a look at this links and I will come back if I
>>> have more questions.
>>> 
>>> 
>>> On Wed, Aug 13, 2014 at 4:33 PM, Chris Riccomini <
>>> [email protected]> wrote:
>>> 
>>>> Hey Telles,
>>>> 
>>>>>> Should I use many kafka brokers or one will suffice?
>>>> 
>>>> The number of brokers you use is dependent on the number of
>>>> messages/sec
>>>> you're going to receive, the size of those messages, and how long
>>>> you're
>>>> going to retain them.
>>>> 
>>>> Here is a good blog post on Kafka performance that should give you some
>>>> idea of the numbers:
>>>> 
>>>> 
>>>> 
>>>> 
>>>> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-milli
>>>> on-
>>>> writes-second-three-cheap-machines
>>>> 
>>>> <https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mill
>>>> ion-writes-second-three-cheap-machines>
>>>> 
>>>>>> It could be just one job, but what is the best way to deploy many
>>>>>> instances of this job so I could process a heavy load of messages?
>>>> 
>>>> You should adjust the yarn.container.count to increase the parallelism
>>>> of
>>>> your job. By default, you get one container, but you can adjust this
>>>> up to
>>>> the total number of input partitions that you have. Have a look here
>>>> for
>>>> some details about how Samza's parallelism works:
>>>> 
>>>> 
>>>> 
>>>> http://samza.incubator.apache.org/learn/documentation/0.7.0/introduction
>>>> /co
>>>> ncepts.html
>>>> 
>>>> <http://samza.incubator.apache.org/learn/documentation/0.7.0/introductio
>>>> n/concepts.html>
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Cheers,
>>>> Chris
>>>> 
>>>> On 8/13/14 9:37 AM, "Telles Nobrega" <[email protected]> wrote:
>>>> 
>>>>> Should I use many kafka brokers or one will sufice?
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> 
>>>>> On Wed, Aug 13, 2014 at 7:24 AM, Telles Nobrega
>>>> <[email protected]
>>>>> 
>>>>> wrote:
>>>>> 
>>>>>> It could be just one job, but what is the best way to deploy many
>>>>>> instances of this job so I could process a heavy load of messages?
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> On 13 Aug 2014, at 01:39, Yan Fang <[email protected]> wrote:
>>>>>> 
>>>>>>> *"Does one kafka-broker handle this much messages per second?"*
>>>>>>> 
>>>>>>> I believe @Chris has better answer about this.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> *"I have one job that get this messages and another that reads
>>>> from
>>>>>> the
>>>>>>> output of the first job that does some more processing."*
>>>>>>> 
>>>>>>>   Why not use one job get messages and process them?
>>>>>>> 
>>>>>>> *" when I change a*
>>>>>>> 
>>>>>>> *configuration of one my jobs do I need to recompile it and send
>>>> the
>>>>>> new
>>>>>>> tar.gz to hdfs or just change the deploy/samza config and it
>>>> should
>>>>>> work."*
>>>>>>> 
>>>>>>>   No, you don't need to recompile. Change the config and
>>>> run-job. It
>>>>>> will
>>>>>>> work.
>>>>>>> 
>>>>>>> Thanks.
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> 
>>>>>>> Fang, Yan
>>>>>>> [email protected]
>>>>>>> +1 (206) 849-4108
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Aug 12, 2014 at 8:47 PM, Telles Nobrega
>>>>>> <[email protected]
>>>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Not completely related to the topic of the question but when I
>>>>>> change a
>>>>>>>> configuration of one my jobs do I need to recompile it and send
>>>> the
>>>>>> new
>>>>>>>> tar.gz to hdfs or just change the deploy/samza config and it
>>>> should
>>>>>> work.
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Tue, Aug 12, 2014 at 11:23 PM, Telles Nobrega <
>>>>>> [email protected]>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi, I'm running an experiment that I'm suppose to run samza with
>>>>>>>> different
>>>>>>>>> input rates. First I'm running with 420 messages/second and I
>>>> scale
>>>>>> up
>>>>>> to
>>>>>>>>> 33200 messages/second.
>>>>>>>>> 
>>>>>>>>> Does one kafka-broker handle this much messages per second?
>>>>>>>>> Second, what is the best way to read into samza this much
>>>> messages?
>>>>>> I
>>>>>>>> have
>>>>>>>>> one job that get this messages and another that reads from the
>>>>>> output
>>>>>> of
>>>>>>>>> the first job that does some more processing. Is the best way to
>>>> use
>>>>>> more
>>>>>>>>> containers and split kafka topics in partitions (the same
>>>> number of
>>>>>>>>> containers) or is there a better way to do this.
>>>>>>>>> 
>>>>>>>>> Thanks in advance,
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> ------------------------------------------
>>>>>>>>> Telles Mota Vidal Nobrega
>>>>>>>>> M.sc. Candidate at UFCG
>>>>>>>>> B.sc. in Computer Science at UFCG
>>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> ------------------------------------------
>>>>>>>> Telles Mota Vidal Nobrega
>>>>>>>> M.sc. Candidate at UFCG
>>>>>>>> B.sc. in Computer Science at UFCG
>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> ------------------------------------------
>>>>> Telles Mota Vidal Nobrega
>>>>> M.sc. Candidate at UFCG
>>>>> B.sc. in Computer Science at UFCG
>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> ------------------------------------------
>>> Telles Mota Vidal Nobrega
>>> M.sc. Candidate at UFCG
>>> B.sc. in Computer Science at UFCG
>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
>>> 
>> 
>> 
>> 
>> -- 
>> ------------------------------------------
>> Telles Mota Vidal Nobrega
>> M.sc. Candidate at UFCG
>> B.sc. in Computer Science at UFCG
>> Software Engineer at OpenStack Project - HP/LSD-UFCG

Re: How to run multiple instances of the same job

Reply via email to