Re: How to run multiple instances of the same job

Chris Riccomini Wed, 20 Aug 2014 14:48:12 -0700

Hey Telles,

The Samza job can be configured to disable batching and use sync sends:


systems.kafka.producer.producer.type=sync
systems.kafka.producer.batch.num.messages=1

This is how the hello-samza job works. :)


Note that it will dramatically affect your throughput, but if you're doing
this, you probably have a low throughput topic anyway.

Cheers,
Chris

On 8/20/14 1:21 PM, "Telles Nobrega" <[email protected]> wrote:

>Chris, is there a way to eliminate completely buffering in samza + kafka?
>
>
>On Mon, Aug 18, 2014 at 1:46 PM, Telles Nobrega <[email protected]>
>wrote:
>
>> I see. Thanks. Weird thing is it works some rounds and than stops.
>>
>>
>> On Mon, Aug 18, 2014 at 1:44 PM, Chris Riccomini <
>> [email protected]> wrote:
>>
>>> Hey Telles,
>>>
>>> The problem could occur with HDFS. I believe that LOCALIZING just means
>>> that the NM is trying to download the artifact from wherever it is (be
>>> that HTTP, HDFS, etc).
>>>
>>> Cheers,
>>> Chris
>>>
>>> On 8/18/14 9:22 AM, "Telles Nobrega" <[email protected]> wrote:
>>>
>>> >Chris,
>>> >
>>> >I'm using HDFS, I will run again and see if the problem happens and I
>>> will
>>> >post if i find any problem or have more questions.
>>> >
>>> >Thanks.
>>> >
>>> >
>>> >On Mon, Aug 18, 2014 at 12:45 PM, Chris Riccomini <
>>> >[email protected]> wrote:
>>> >
>>> >> Hey Telles,
>>> >>
>>> >> Usually, when a job is stuck in LOCALIZING, it means that YARN is
>>> >> struggling to distribute your binary (the .tgz) to the appropriate
>>> >> NodeManagers, I think. You should check your NM logs and see if
>>>there
>>> >>are
>>> >> any hints about what's going on there.
>>> >>
>>> >> I've seen this in the past when the NM hangs trying to download a
>>>.tgz
>>> >> from the HTTP server for some reason.
>>> >>
>>> >> Cheers,
>>> >> Chris
>>> >>
>>> >> On 8/16/14 10:41 PM, "Telles Nobrega" <[email protected]>
>>>wrote:
>>> >>
>>> >> >I was able to fix this problem, now I¹m having another one. I¹m
>>>using
>>> a
>>> >> >script that starts kafka, deploys samza jobs, stop them, kills
>>>kafka
>>> >>and
>>> >> >delete configurations in zookeeper and kafka-log files. Them start
>>> over
>>> >> >again. I see that sometimes jobs don¹t start running, they stay in
>>> >> >accepted state with info LOCALIZING, what can be the cause for
>>>that?
>>> >> >
>>> >> >Thanks.
>>> >> >On 15 Aug 2014, at 19:18, Chris Riccomini
>>> >> ><[email protected]> wrote:
>>> >> >
>>> >> >> Hey Telles,
>>> >> >>
>>> >> >> If you set yarn.container.count to 5, you should get 5
>>>containers.
>>> >>The
>>> >> >>two
>>> >> >> cases where you don't are:
>>> >> >>
>>> >> >> 1. The grid is at capacity, and doesn't have the memory to
>>>fulfill
>>> >>all
>>> >> >> container requests.
>>> >> >> 2. You set yarn.container.count higher than the number of
>>>partitions
>>> >> >>that
>>> >> >> your input stream has.
>>> >> >>
>>> >> >> Cheers,
>>> >> >> Chris
>>> >> >>
>>> >> >> On 8/15/14 1:56 PM, "Telles Nobrega" <[email protected]>
>>> wrote:
>>> >> >>
>>> >> >>> Hi Chris,
>>> >> >>>
>>> >> >>> I started playing with the yarn.container.count and set it to 5.
>>> >> >>>
>>> >> >>> At first I thought I had to compile the package again and
>>>republish
>>> >>to
>>> >> >>> hdfs
>>> >> >>> because I couldn't run 5 containers.
>>> >> >>> Then I recompiled but I still only got 3 containers, is that
>>>normal
>>> >> >>> behaviour?
>>> >> >>>
>>> >> >>> Thanks.
>>> >> >>>
>>> >> >>>
>>> >> >>> On Wed, Aug 13, 2014 at 5:00 PM, Telles Nobrega
>>> >> >>><[email protected]>
>>> >> >>> wrote:
>>> >> >>>
>>> >> >>>> Thanks Chris, i will take a look at this links and I will come
>>> back
>>> >> >>>>if I
>>> >> >>>> have more questions.
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> On Wed, Aug 13, 2014 at 4:33 PM, Chris Riccomini <
>>> >> >>>> [email protected]> wrote:
>>> >> >>>>
>>> >> >>>>> Hey Telles,
>>> >> >>>>>
>>> >> >>>>>>> Should I use many kafka brokers or one will suffice?
>>> >> >>>>>
>>> >> >>>>> The number of brokers you use is dependent on the number of
>>> >> >>>>> messages/sec
>>> >> >>>>> you're going to receive, the size of those messages, and how
>>>long
>>> >> >>>>> you're
>>> >> >>>>> going to retain them.
>>> >> >>>>>
>>> >> >>>>> Here is a good blog post on Kafka performance that should give
>>> you
>>> >> >>>>>some
>>> >> >>>>> idea of the numbers:
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>>
>>> >> 
>>>https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mil
>>> >> >>>>>li
>>> >> >>>>> on-
>>> >> >>>>> writes-second-three-cheap-machines
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>><
>>> >> 
>>>https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mi
>>> >> >>>>>ll
>>> >> >>>>> ion-writes-second-three-cheap-machines>
>>> >> >>>>>
>>> >> >>>>>>> It could be just one job, but what is the best way to deploy
>>> >>many
>>> >> >>>>>>> instances of this job so I could process a heavy load of
>>> >>messages?
>>> >> >>>>>
>>> >> >>>>> You should adjust the yarn.container.count to increase the
>>> >> >>>>>parallelism
>>> >> >>>>> of
>>> >> >>>>> your job. By default, you get one container, but you can
>>>adjust
>>> >>this
>>> >> >>>>> up to
>>> >> >>>>> the total number of input partitions that you have. Have a
>>>look
>>> >>here
>>> >> >>>>> for
>>> >> >>>>> some details about how Samza's parallelism works:
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>>
>>> >> 
>>>http://samza.incubator.apache.org/learn/documentation/0.7.0/introducti
>>> >> >>>>>on
>>> >> >>>>> /co
>>> >> >>>>> ncepts.html
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>><
>>> >> 
>>>http://samza.incubator.apache.org/learn/documentation/0.7.0/introduct
>>> >> >>>>>io
>>> >> >>>>> n/concepts.html>
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>> Cheers,
>>> >> >>>>> Chris
>>> >> >>>>>
>>> >> >>>>> On 8/13/14 9:37 AM, "Telles Nobrega" <[email protected]>
>>> >> wrote:
>>> >> >>>>>
>>> >> >>>>>> Should I use many kafka brokers or one will sufice?
>>> >> >>>>>>
>>> >> >>>>>> Thanks
>>> >> >>>>>>
>>> >> >>>>>>
>>> >> >>>>>> On Wed, Aug 13, 2014 at 7:24 AM, Telles Nobrega
>>> >> >>>>> <[email protected]
>>> >> >>>>>>
>>> >> >>>>>> wrote:
>>> >> >>>>>>
>>> >> >>>>>>> It could be just one job, but what is the best way to deploy
>>> >>many
>>> >> >>>>>>> instances of this job so I could process a heavy load of
>>> >>messages?
>>> >> >>>>>>>
>>> >> >>>>>>> Thanks,
>>> >> >>>>>>>
>>> >> >>>>>>> On 13 Aug 2014, at 01:39, Yan Fang <[email protected]>
>>> wrote:
>>> >> >>>>>>>
>>> >> >>>>>>>> *"Does one kafka-broker handle this much messages per
>>> second?"*
>>> >> >>>>>>>>
>>> >> >>>>>>>> I believe @Chris has better answer about this.
>>> >> >>>>>>>>
>>> >> >>>>>>>>
>>> >> >>>>>>>>
>>> >> >>>>>>>> *"I have one job that get this messages and another that
>>>reads
>>> >> >>>>> from
>>> >> >>>>>>> the
>>> >> >>>>>>>> output of the first job that does some more processing."*
>>> >> >>>>>>>>
>>> >> >>>>>>>>   Why not use one job get messages and process them?
>>> >> >>>>>>>>
>>> >> >>>>>>>> *" when I change a*
>>> >> >>>>>>>>
>>> >> >>>>>>>> *configuration of one my jobs do I need to recompile it and
>>> >>send
>>> >> >>>>> the
>>> >> >>>>>>> new
>>> >> >>>>>>>> tar.gz to hdfs or just change the deploy/samza config and
>>>it
>>> >> >>>>> should
>>> >> >>>>>>> work."*
>>> >> >>>>>>>>
>>> >> >>>>>>>>   No, you don't need to recompile. Change the config and
>>> >> >>>>> run-job. It
>>> >> >>>>>>> will
>>> >> >>>>>>>> work.
>>> >> >>>>>>>>
>>> >> >>>>>>>> Thanks.
>>> >> >>>>>>>>
>>> >> >>>>>>>> Cheers,
>>> >> >>>>>>>>
>>> >> >>>>>>>> Fang, Yan
>>> >> >>>>>>>> [email protected]
>>> >> >>>>>>>> +1 (206) 849-4108
>>> >> >>>>>>>>
>>> >> >>>>>>>>
>>> >> >>>>>>>> On Tue, Aug 12, 2014 at 8:47 PM, Telles Nobrega
>>> >> >>>>>>> <[email protected]
>>> >> >>>>>>>>
>>> >> >>>>>>>> wrote:
>>> >> >>>>>>>>
>>> >> >>>>>>>>> Not completely related to the topic of the question but
>>>when
>>> I
>>> >> >>>>>>> change a
>>> >> >>>>>>>>> configuration of one my jobs do I need to recompile it and
>>> >>send
>>> >> >>>>> the
>>> >> >>>>>>> new
>>> >> >>>>>>>>> tar.gz to hdfs or just change the deploy/samza config and
>>>it
>>> >> >>>>> should
>>> >> >>>>>>> work.
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> Thanks
>>> >> >>>>>>>>>
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> On Tue, Aug 12, 2014 at 11:23 PM, Telles Nobrega <
>>> >> >>>>>>> [email protected]>
>>> >> >>>>>>>>> wrote:
>>> >> >>>>>>>>>
>>> >> >>>>>>>>>> Hi, I'm running an experiment that I'm suppose to run
>>>samza
>>> >>with
>>> >> >>>>>>>>> different
>>> >> >>>>>>>>>> input rates. First I'm running with 420 messages/second
>>>and
>>> I
>>> >> >>>>> scale
>>> >> >>>>>>> up
>>> >> >>>>>>> to
>>> >> >>>>>>>>>> 33200 messages/second.
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>> Does one kafka-broker handle this much messages per
>>>second?
>>> >> >>>>>>>>>> Second, what is the best way to read into samza this much
>>> >> >>>>> messages?
>>> >> >>>>>>> I
>>> >> >>>>>>>>> have
>>> >> >>>>>>>>>> one job that get this messages and another that reads
>>>from
>>> >>the
>>> >> >>>>>>> output
>>> >> >>>>>>> of
>>> >> >>>>>>>>>> the first job that does some more processing. Is the best
>>> >>way to
>>> >> >>>>> use
>>> >> >>>>>>> more
>>> >> >>>>>>>>>> containers and split kafka topics in partitions (the same
>>> >> >>>>> number of
>>> >> >>>>>>>>>> containers) or is there a better way to do this.
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>> Thanks in advance,
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>> --
>>> >> >>>>>>>>>> ------------------------------------------
>>> >> >>>>>>>>>> Telles Mota Vidal Nobrega
>>> >> >>>>>>>>>> M.sc. Candidate at UFCG
>>> >> >>>>>>>>>> B.sc. in Computer Science at UFCG
>>> >> >>>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>
>>> >> >>>>>>>>>
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> --
>>> >> >>>>>>>>> ------------------------------------------
>>> >> >>>>>>>>> Telles Mota Vidal Nobrega
>>> >> >>>>>>>>> M.sc. Candidate at UFCG
>>> >> >>>>>>>>> B.sc. in Computer Science at UFCG
>>> >> >>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
>>> >> >>>>>>>>>
>>> >> >>>>>>>
>>> >> >>>>>>>
>>> >> >>>>>>
>>> >> >>>>>>
>>> >> >>>>>> --
>>> >> >>>>>> ------------------------------------------
>>> >> >>>>>> Telles Mota Vidal Nobrega
>>> >> >>>>>> M.sc. Candidate at UFCG
>>> >> >>>>>> B.sc. in Computer Science at UFCG
>>> >> >>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> --
>>> >> >>>> ------------------------------------------
>>> >> >>>> Telles Mota Vidal Nobrega
>>> >> >>>> M.sc. Candidate at UFCG
>>> >> >>>> B.sc. in Computer Science at UFCG
>>> >> >>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
>>> >> >>>>
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>> --
>>> >> >>> ------------------------------------------
>>> >> >>> Telles Mota Vidal Nobrega
>>> >> >>> M.sc. Candidate at UFCG
>>> >> >>> B.sc. in Computer Science at UFCG
>>> >> >>> Software Engineer at OpenStack Project - HP/LSD-UFCG
>>> >> >
>>> >>
>>> >>
>>> >
>>> >
>>> >--
>>> >------------------------------------------
>>> >Telles Mota Vidal Nobrega
>>> >M.sc. Candidate at UFCG
>>> >B.sc. in Computer Science at UFCG
>>> >Software Engineer at OpenStack Project - HP/LSD-UFCG
>>>
>>>
>>
>>
>> --
>> ------------------------------------------
>> Telles Mota Vidal Nobrega
>> M.sc. Candidate at UFCG
>> B.sc. in Computer Science at UFCG
>> Software Engineer at OpenStack Project - HP/LSD-UFCG
>>
>
>
>
>-- 
>------------------------------------------
>Telles Mota Vidal Nobrega
>M.sc. Candidate at UFCG
>B.sc. in Computer Science at UFCG
>Software Engineer at OpenStack Project - HP/LSD-UFCG

Re: How to run multiple instances of the same job

Reply via email to