Re: How to run multiple instances of the same job

Telles Nobrega Thu, 21 Aug 2014 16:07:18 -0700

Thanks. So I need to send lots of messages to kafka, I'm using a producer
that connects to kafka to send it. SO increase this number I'm using many
producers, but seems like kafka is not accepting them all. Is there a way
to work around this? I need some like 30000 messages per second.


Thanks


On Wed, Aug 20, 2014 at 6:47 PM, Chris Riccomini <
[email protected]> wrote:

> Hey Telles,
>
> The Samza job can be configured to disable batching and use sync sends:
>
> systems.kafka.producer.producer.type=sync
> systems.kafka.producer.batch.num.messages=1
>
> This is how the hello-samza job works. :)
>
>
> Note that it will dramatically affect your throughput, but if you're doing
> this, you probably have a low throughput topic anyway.
>
> Cheers,
> Chris
>
> On 8/20/14 1:21 PM, "Telles Nobrega" <[email protected]> wrote:
>
> >Chris, is there a way to eliminate completely buffering in samza + kafka?
> >
> >
> >On Mon, Aug 18, 2014 at 1:46 PM, Telles Nobrega <[email protected]>
> >wrote:
> >
> >> I see. Thanks. Weird thing is it works some rounds and than stops.
> >>
> >>
> >> On Mon, Aug 18, 2014 at 1:44 PM, Chris Riccomini <
> >> [email protected]> wrote:
> >>
> >>> Hey Telles,
> >>>
> >>> The problem could occur with HDFS. I believe that LOCALIZING just means
> >>> that the NM is trying to download the artifact from wherever it is (be
> >>> that HTTP, HDFS, etc).
> >>>
> >>> Cheers,
> >>> Chris
> >>>
> >>> On 8/18/14 9:22 AM, "Telles Nobrega" <[email protected]> wrote:
> >>>
> >>> >Chris,
> >>> >
> >>> >I'm using HDFS, I will run again and see if the problem happens and I
> >>> will
> >>> >post if i find any problem or have more questions.
> >>> >
> >>> >Thanks.
> >>> >
> >>> >
> >>> >On Mon, Aug 18, 2014 at 12:45 PM, Chris Riccomini <
> >>> >[email protected]> wrote:
> >>> >
> >>> >> Hey Telles,
> >>> >>
> >>> >> Usually, when a job is stuck in LOCALIZING, it means that YARN is
> >>> >> struggling to distribute your binary (the .tgz) to the appropriate
> >>> >> NodeManagers, I think. You should check your NM logs and see if
> >>>there
> >>> >>are
> >>> >> any hints about what's going on there.
> >>> >>
> >>> >> I've seen this in the past when the NM hangs trying to download a
> >>>.tgz
> >>> >> from the HTTP server for some reason.
> >>> >>
> >>> >> Cheers,
> >>> >> Chris
> >>> >>
> >>> >> On 8/16/14 10:41 PM, "Telles Nobrega" <[email protected]>
> >>>wrote:
> >>> >>
> >>> >> >I was able to fix this problem, now I¹m having another one. I¹m
> >>>using
> >>> a
> >>> >> >script that starts kafka, deploys samza jobs, stop them, kills
> >>>kafka
> >>> >>and
> >>> >> >delete configurations in zookeeper and kafka-log files. Them start
> >>> over
> >>> >> >again. I see that sometimes jobs don¹t start running, they stay in
> >>> >> >accepted state with info LOCALIZING, what can be the cause for
> >>>that?
> >>> >> >
> >>> >> >Thanks.
> >>> >> >On 15 Aug 2014, at 19:18, Chris Riccomini
> >>> >> ><[email protected]> wrote:
> >>> >> >
> >>> >> >> Hey Telles,
> >>> >> >>
> >>> >> >> If you set yarn.container.count to 5, you should get 5
> >>>containers.
> >>> >>The
> >>> >> >>two
> >>> >> >> cases where you don't are:
> >>> >> >>
> >>> >> >> 1. The grid is at capacity, and doesn't have the memory to
> >>>fulfill
> >>> >>all
> >>> >> >> container requests.
> >>> >> >> 2. You set yarn.container.count higher than the number of
> >>>partitions
> >>> >> >>that
> >>> >> >> your input stream has.
> >>> >> >>
> >>> >> >> Cheers,
> >>> >> >> Chris
> >>> >> >>
> >>> >> >> On 8/15/14 1:56 PM, "Telles Nobrega" <[email protected]>
> >>> wrote:
> >>> >> >>
> >>> >> >>> Hi Chris,
> >>> >> >>>
> >>> >> >>> I started playing with the yarn.container.count and set it to 5.
> >>> >> >>>
> >>> >> >>> At first I thought I had to compile the package again and
> >>>republish
> >>> >>to
> >>> >> >>> hdfs
> >>> >> >>> because I couldn't run 5 containers.
> >>> >> >>> Then I recompiled but I still only got 3 containers, is that
> >>>normal
> >>> >> >>> behaviour?
> >>> >> >>>
> >>> >> >>> Thanks.
> >>> >> >>>
> >>> >> >>>
> >>> >> >>> On Wed, Aug 13, 2014 at 5:00 PM, Telles Nobrega
> >>> >> >>><[email protected]>
> >>> >> >>> wrote:
> >>> >> >>>
> >>> >> >>>> Thanks Chris, i will take a look at this links and I will come
> >>> back
> >>> >> >>>>if I
> >>> >> >>>> have more questions.
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>> On Wed, Aug 13, 2014 at 4:33 PM, Chris Riccomini <
> >>> >> >>>> [email protected]> wrote:
> >>> >> >>>>
> >>> >> >>>>> Hey Telles,
> >>> >> >>>>>
> >>> >> >>>>>>> Should I use many kafka brokers or one will suffice?
> >>> >> >>>>>
> >>> >> >>>>> The number of brokers you use is dependent on the number of
> >>> >> >>>>> messages/sec
> >>> >> >>>>> you're going to receive, the size of those messages, and how
> >>>long
> >>> >> >>>>> you're
> >>> >> >>>>> going to retain them.
> >>> >> >>>>>
> >>> >> >>>>> Here is a good blog post on Kafka performance that should give
> >>> you
> >>> >> >>>>>some
> >>> >> >>>>> idea of the numbers:
> >>> >> >>>>>
> >>> >> >>>>>
> >>> >> >>>>>
> >>> >> >>>>>
> >>> >> >>>>>
> >>> >> >>>>>
> >>> >>
> >>>https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mil
> >>> >> >>>>>li
> >>> >> >>>>> on-
> >>> >> >>>>> writes-second-three-cheap-machines
> >>> >> >>>>>
> >>> >> >>>>>
> >>> >> >>>>><
> >>> >>
> >>>https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mi
> >>> >> >>>>>ll
> >>> >> >>>>> ion-writes-second-three-cheap-machines>
> >>> >> >>>>>
> >>> >> >>>>>>> It could be just one job, but what is the best way to deploy
> >>> >>many
> >>> >> >>>>>>> instances of this job so I could process a heavy load of
> >>> >>messages?
> >>> >> >>>>>
> >>> >> >>>>> You should adjust the yarn.container.count to increase the
> >>> >> >>>>>parallelism
> >>> >> >>>>> of
> >>> >> >>>>> your job. By default, you get one container, but you can
> >>>adjust
> >>> >>this
> >>> >> >>>>> up to
> >>> >> >>>>> the total number of input partitions that you have. Have a
> >>>look
> >>> >>here
> >>> >> >>>>> for
> >>> >> >>>>> some details about how Samza's parallelism works:
> >>> >> >>>>>
> >>> >> >>>>>
> >>> >> >>>>>
> >>> >> >>>>>
> >>> >> >>>>>
> >>> >>
> >>>http://samza.incubator.apache.org/learn/documentation/0.7.0/introducti
> >>> >> >>>>>on
> >>> >> >>>>> /co
> >>> >> >>>>> ncepts.html
> >>> >> >>>>>
> >>> >> >>>>>
> >>> >> >>>>><
> >>> >>
> >>>http://samza.incubator.apache.org/learn/documentation/0.7.0/introduct
> >>> >> >>>>>io
> >>> >> >>>>> n/concepts.html>
> >>> >> >>>>>
> >>> >> >>>>>
> >>> >> >>>>>
> >>> >> >>>>>
> >>> >> >>>>> Cheers,
> >>> >> >>>>> Chris
> >>> >> >>>>>
> >>> >> >>>>> On 8/13/14 9:37 AM, "Telles Nobrega" <[email protected]
> >
> >>> >> wrote:
> >>> >> >>>>>
> >>> >> >>>>>> Should I use many kafka brokers or one will sufice?
> >>> >> >>>>>>
> >>> >> >>>>>> Thanks
> >>> >> >>>>>>
> >>> >> >>>>>>
> >>> >> >>>>>> On Wed, Aug 13, 2014 at 7:24 AM, Telles Nobrega
> >>> >> >>>>> <[email protected]
> >>> >> >>>>>>
> >>> >> >>>>>> wrote:
> >>> >> >>>>>>
> >>> >> >>>>>>> It could be just one job, but what is the best way to deploy
> >>> >>many
> >>> >> >>>>>>> instances of this job so I could process a heavy load of
> >>> >>messages?
> >>> >> >>>>>>>
> >>> >> >>>>>>> Thanks,
> >>> >> >>>>>>>
> >>> >> >>>>>>> On 13 Aug 2014, at 01:39, Yan Fang <[email protected]>
> >>> wrote:
> >>> >> >>>>>>>
> >>> >> >>>>>>>> *"Does one kafka-broker handle this much messages per
> >>> second?"*
> >>> >> >>>>>>>>
> >>> >> >>>>>>>> I believe @Chris has better answer about this.
> >>> >> >>>>>>>>
> >>> >> >>>>>>>>
> >>> >> >>>>>>>>
> >>> >> >>>>>>>> *"I have one job that get this messages and another that
> >>>reads
> >>> >> >>>>> from
> >>> >> >>>>>>> the
> >>> >> >>>>>>>> output of the first job that does some more processing."*
> >>> >> >>>>>>>>
> >>> >> >>>>>>>>   Why not use one job get messages and process them?
> >>> >> >>>>>>>>
> >>> >> >>>>>>>> *" when I change a*
> >>> >> >>>>>>>>
> >>> >> >>>>>>>> *configuration of one my jobs do I need to recompile it and
> >>> >>send
> >>> >> >>>>> the
> >>> >> >>>>>>> new
> >>> >> >>>>>>>> tar.gz to hdfs or just change the deploy/samza config and
> >>>it
> >>> >> >>>>> should
> >>> >> >>>>>>> work."*
> >>> >> >>>>>>>>
> >>> >> >>>>>>>>   No, you don't need to recompile. Change the config and
> >>> >> >>>>> run-job. It
> >>> >> >>>>>>> will
> >>> >> >>>>>>>> work.
> >>> >> >>>>>>>>
> >>> >> >>>>>>>> Thanks.
> >>> >> >>>>>>>>
> >>> >> >>>>>>>> Cheers,
> >>> >> >>>>>>>>
> >>> >> >>>>>>>> Fang, Yan
> >>> >> >>>>>>>> [email protected]
> >>> >> >>>>>>>> +1 (206) 849-4108
> >>> >> >>>>>>>>
> >>> >> >>>>>>>>
> >>> >> >>>>>>>> On Tue, Aug 12, 2014 at 8:47 PM, Telles Nobrega
> >>> >> >>>>>>> <[email protected]
> >>> >> >>>>>>>>
> >>> >> >>>>>>>> wrote:
> >>> >> >>>>>>>>
> >>> >> >>>>>>>>> Not completely related to the topic of the question but
> >>>when
> >>> I
> >>> >> >>>>>>> change a
> >>> >> >>>>>>>>> configuration of one my jobs do I need to recompile it and
> >>> >>send
> >>> >> >>>>> the
> >>> >> >>>>>>> new
> >>> >> >>>>>>>>> tar.gz to hdfs or just change the deploy/samza config and
> >>>it
> >>> >> >>>>> should
> >>> >> >>>>>>> work.
> >>> >> >>>>>>>>>
> >>> >> >>>>>>>>> Thanks
> >>> >> >>>>>>>>>
> >>> >> >>>>>>>>>
> >>> >> >>>>>>>>> On Tue, Aug 12, 2014 at 11:23 PM, Telles Nobrega <
> >>> >> >>>>>>> [email protected]>
> >>> >> >>>>>>>>> wrote:
> >>> >> >>>>>>>>>
> >>> >> >>>>>>>>>> Hi, I'm running an experiment that I'm suppose to run
> >>>samza
> >>> >>with
> >>> >> >>>>>>>>> different
> >>> >> >>>>>>>>>> input rates. First I'm running with 420 messages/second
> >>>and
> >>> I
> >>> >> >>>>> scale
> >>> >> >>>>>>> up
> >>> >> >>>>>>> to
> >>> >> >>>>>>>>>> 33200 messages/second.
> >>> >> >>>>>>>>>>
> >>> >> >>>>>>>>>> Does one kafka-broker handle this much messages per
> >>>second?
> >>> >> >>>>>>>>>> Second, what is the best way to read into samza this much
> >>> >> >>>>> messages?
> >>> >> >>>>>>> I
> >>> >> >>>>>>>>> have
> >>> >> >>>>>>>>>> one job that get this messages and another that reads
> >>>from
> >>> >>the
> >>> >> >>>>>>> output
> >>> >> >>>>>>> of
> >>> >> >>>>>>>>>> the first job that does some more processing. Is the best
> >>> >>way to
> >>> >> >>>>> use
> >>> >> >>>>>>> more
> >>> >> >>>>>>>>>> containers and split kafka topics in partitions (the same
> >>> >> >>>>> number of
> >>> >> >>>>>>>>>> containers) or is there a better way to do this.
> >>> >> >>>>>>>>>>
> >>> >> >>>>>>>>>> Thanks in advance,
> >>> >> >>>>>>>>>>
> >>> >> >>>>>>>>>> --
> >>> >> >>>>>>>>>> ------------------------------------------
> >>> >> >>>>>>>>>> Telles Mota Vidal Nobrega
> >>> >> >>>>>>>>>> M.sc. Candidate at UFCG
> >>> >> >>>>>>>>>> B.sc. in Computer Science at UFCG
> >>> >> >>>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
> >>> >> >>>>>>>>>>
> >>> >> >>>>>>>>>
> >>> >> >>>>>>>>>
> >>> >> >>>>>>>>>
> >>> >> >>>>>>>>> --
> >>> >> >>>>>>>>> ------------------------------------------
> >>> >> >>>>>>>>> Telles Mota Vidal Nobrega
> >>> >> >>>>>>>>> M.sc. Candidate at UFCG
> >>> >> >>>>>>>>> B.sc. in Computer Science at UFCG
> >>> >> >>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
> >>> >> >>>>>>>>>
> >>> >> >>>>>>>
> >>> >> >>>>>>>
> >>> >> >>>>>>
> >>> >> >>>>>>
> >>> >> >>>>>> --
> >>> >> >>>>>> ------------------------------------------
> >>> >> >>>>>> Telles Mota Vidal Nobrega
> >>> >> >>>>>> M.sc. Candidate at UFCG
> >>> >> >>>>>> B.sc. in Computer Science at UFCG
> >>> >> >>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
> >>> >> >>>>>
> >>> >> >>>>>
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>> --
> >>> >> >>>> ------------------------------------------
> >>> >> >>>> Telles Mota Vidal Nobrega
> >>> >> >>>> M.sc. Candidate at UFCG
> >>> >> >>>> B.sc. in Computer Science at UFCG
> >>> >> >>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
> >>> >> >>>>
> >>> >> >>>
> >>> >> >>>
> >>> >> >>>
> >>> >> >>> --
> >>> >> >>> ------------------------------------------
> >>> >> >>> Telles Mota Vidal Nobrega
> >>> >> >>> M.sc. Candidate at UFCG
> >>> >> >>> B.sc. in Computer Science at UFCG
> >>> >> >>> Software Engineer at OpenStack Project - HP/LSD-UFCG
> >>> >> >
> >>> >>
> >>> >>
> >>> >
> >>> >
> >>> >--
> >>> >------------------------------------------
> >>> >Telles Mota Vidal Nobrega
> >>> >M.sc. Candidate at UFCG
> >>> >B.sc. in Computer Science at UFCG
> >>> >Software Engineer at OpenStack Project - HP/LSD-UFCG
> >>>
> >>>
> >>
> >>
> >> --
> >> ------------------------------------------
> >> Telles Mota Vidal Nobrega
> >> M.sc. Candidate at UFCG
> >> B.sc. in Computer Science at UFCG
> >> Software Engineer at OpenStack Project - HP/LSD-UFCG
> >>
> >
> >
> >
> >--
> >------------------------------------------
> >Telles Mota Vidal Nobrega
> >M.sc. Candidate at UFCG
> >B.sc. in Computer Science at UFCG
> >Software Engineer at OpenStack Project - HP/LSD-UFCG
>
>


-- 
------------------------------------------
Telles Mota Vidal Nobrega
M.sc. Candidate at UFCG
B.sc. in Computer Science at UFCG
Software Engineer at OpenStack Project - HP/LSD-UFCG

Re: How to run multiple instances of the same job

Reply via email to