Re: How to run multiple instances of the same job

Telles Nobrega Fri, 22 Aug 2014 08:02:25 -0700

I was mistaken, Kafka receives them all, samza doesn't process all because
I'm not using buffer.



On Fri, Aug 22, 2014 at 11:55 AM, Chris Riccomini <
[email protected]> wrote:

> Hey Telles,
>
> >> SO increase this number I'm using many producers, but seems like kafka
> >>is not accepting them all.
>
> When you say Kafka is not accepting them, what do you mean? Kafka
> generally doesn't reject messages unless the size of the message that
> you're sending is too large (message.max.bytes in
> http://kafka.apache.org/documentation.html#brokerconfigs).
>
> Cheers,
> Chris
>
> On 8/21/14 4:05 PM, "Telles Nobrega" <[email protected]> wrote:
>
> >Thanks. So I need to send lots of messages to kafka, I'm using a producer
> >that connects to kafka to send it. SO increase this number I'm using many
> >producers, but seems like kafka is not accepting them all. Is there a way
> >to work around this? I need some like 30000 messages per second.
> >
> >Thanks
> >
> >
> >On Wed, Aug 20, 2014 at 6:47 PM, Chris Riccomini <
> >[email protected]> wrote:
> >
> >> Hey Telles,
> >>
> >> The Samza job can be configured to disable batching and use sync sends:
> >>
> >> systems.kafka.producer.producer.type=sync
> >> systems.kafka.producer.batch.num.messages=1
> >>
> >> This is how the hello-samza job works. :)
> >>
> >>
> >> Note that it will dramatically affect your throughput, but if you're
> >>doing
> >> this, you probably have a low throughput topic anyway.
> >>
> >> Cheers,
> >> Chris
> >>
> >> On 8/20/14 1:21 PM, "Telles Nobrega" <[email protected]> wrote:
> >>
> >> >Chris, is there a way to eliminate completely buffering in samza +
> >>kafka?
> >> >
> >> >
> >> >On Mon, Aug 18, 2014 at 1:46 PM, Telles Nobrega
> >><[email protected]>
> >> >wrote:
> >> >
> >> >> I see. Thanks. Weird thing is it works some rounds and than stops.
> >> >>
> >> >>
> >> >> On Mon, Aug 18, 2014 at 1:44 PM, Chris Riccomini <
> >> >> [email protected]> wrote:
> >> >>
> >> >>> Hey Telles,
> >> >>>
> >> >>> The problem could occur with HDFS. I believe that LOCALIZING just
> >>means
> >> >>> that the NM is trying to download the artifact from wherever it is
> >>(be
> >> >>> that HTTP, HDFS, etc).
> >> >>>
> >> >>> Cheers,
> >> >>> Chris
> >> >>>
> >> >>> On 8/18/14 9:22 AM, "Telles Nobrega" <[email protected]>
> >>wrote:
> >> >>>
> >> >>> >Chris,
> >> >>> >
> >> >>> >I'm using HDFS, I will run again and see if the problem happens
> >>and I
> >> >>> will
> >> >>> >post if i find any problem or have more questions.
> >> >>> >
> >> >>> >Thanks.
> >> >>> >
> >> >>> >
> >> >>> >On Mon, Aug 18, 2014 at 12:45 PM, Chris Riccomini <
> >> >>> >[email protected]> wrote:
> >> >>> >
> >> >>> >> Hey Telles,
> >> >>> >>
> >> >>> >> Usually, when a job is stuck in LOCALIZING, it means that YARN is
> >> >>> >> struggling to distribute your binary (the .tgz) to the
> >>appropriate
> >> >>> >> NodeManagers, I think. You should check your NM logs and see if
> >> >>>there
> >> >>> >>are
> >> >>> >> any hints about what's going on there.
> >> >>> >>
> >> >>> >> I've seen this in the past when the NM hangs trying to download a
> >> >>>.tgz
> >> >>> >> from the HTTP server for some reason.
> >> >>> >>
> >> >>> >> Cheers,
> >> >>> >> Chris
> >> >>> >>
> >> >>> >> On 8/16/14 10:41 PM, "Telles Nobrega" <[email protected]>
> >> >>>wrote:
> >> >>> >>
> >> >>> >> >I was able to fix this problem, now I¹m having another one. I¹m
> >> >>>using
> >> >>> a
> >> >>> >> >script that starts kafka, deploys samza jobs, stop them, kills
> >> >>>kafka
> >> >>> >>and
> >> >>> >> >delete configurations in zookeeper and kafka-log files. Them
> >>start
> >> >>> over
> >> >>> >> >again. I see that sometimes jobs don¹t start running, they stay
> >>in
> >> >>> >> >accepted state with info LOCALIZING, what can be the cause for
> >> >>>that?
> >> >>> >> >
> >> >>> >> >Thanks.
> >> >>> >> >On 15 Aug 2014, at 19:18, Chris Riccomini
> >> >>> >> ><[email protected]> wrote:
> >> >>> >> >
> >> >>> >> >> Hey Telles,
> >> >>> >> >>
> >> >>> >> >> If you set yarn.container.count to 5, you should get 5
> >> >>>containers.
> >> >>> >>The
> >> >>> >> >>two
> >> >>> >> >> cases where you don't are:
> >> >>> >> >>
> >> >>> >> >> 1. The grid is at capacity, and doesn't have the memory to
> >> >>>fulfill
> >> >>> >>all
> >> >>> >> >> container requests.
> >> >>> >> >> 2. You set yarn.container.count higher than the number of
> >> >>>partitions
> >> >>> >> >>that
> >> >>> >> >> your input stream has.
> >> >>> >> >>
> >> >>> >> >> Cheers,
> >> >>> >> >> Chris
> >> >>> >> >>
> >> >>> >> >> On 8/15/14 1:56 PM, "Telles Nobrega" <[email protected]
> >
> >> >>> wrote:
> >> >>> >> >>
> >> >>> >> >>> Hi Chris,
> >> >>> >> >>>
> >> >>> >> >>> I started playing with the yarn.container.count and set it
> >>to 5.
> >> >>> >> >>>
> >> >>> >> >>> At first I thought I had to compile the package again and
> >> >>>republish
> >> >>> >>to
> >> >>> >> >>> hdfs
> >> >>> >> >>> because I couldn't run 5 containers.
> >> >>> >> >>> Then I recompiled but I still only got 3 containers, is that
> >> >>>normal
> >> >>> >> >>> behaviour?
> >> >>> >> >>>
> >> >>> >> >>> Thanks.
> >> >>> >> >>>
> >> >>> >> >>>
> >> >>> >> >>> On Wed, Aug 13, 2014 at 5:00 PM, Telles Nobrega
> >> >>> >> >>><[email protected]>
> >> >>> >> >>> wrote:
> >> >>> >> >>>
> >> >>> >> >>>> Thanks Chris, i will take a look at this links and I will
> >>come
> >> >>> back
> >> >>> >> >>>>if I
> >> >>> >> >>>> have more questions.
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>> On Wed, Aug 13, 2014 at 4:33 PM, Chris Riccomini <
> >> >>> >> >>>> [email protected]> wrote:
> >> >>> >> >>>>
> >> >>> >> >>>>> Hey Telles,
> >> >>> >> >>>>>
> >> >>> >> >>>>>>> Should I use many kafka brokers or one will suffice?
> >> >>> >> >>>>>
> >> >>> >> >>>>> The number of brokers you use is dependent on the number of
> >> >>> >> >>>>> messages/sec
> >> >>> >> >>>>> you're going to receive, the size of those messages, and
> >>how
> >> >>>long
> >> >>> >> >>>>> you're
> >> >>> >> >>>>> going to retain them.
> >> >>> >> >>>>>
> >> >>> >> >>>>> Here is a good blog post on Kafka performance that should
> >>give
> >> >>> you
> >> >>> >> >>>>>some
> >> >>> >> >>>>> idea of the numbers:
> >> >>> >> >>>>>
> >> >>> >> >>>>>
> >> >>> >> >>>>>
> >> >>> >> >>>>>
> >> >>> >> >>>>>
> >> >>> >> >>>>>
> >> >>> >>
> >>
> >>>>>
> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mil
> >> >>> >> >>>>>li
> >> >>> >> >>>>> on-
> >> >>> >> >>>>> writes-second-three-cheap-machines
> >> >>> >> >>>>>
> >> >>> >> >>>>>
> >> >>> >> >>>>><
> >> >>> >>
> >> >>>
> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mi
> >> >>> >> >>>>>ll
> >> >>> >> >>>>> ion-writes-second-three-cheap-machines>
> >> >>> >> >>>>>
> >> >>> >> >>>>>>> It could be just one job, but what is the best way to
> >>deploy
> >> >>> >>many
> >> >>> >> >>>>>>> instances of this job so I could process a heavy load of
> >> >>> >>messages?
> >> >>> >> >>>>>
> >> >>> >> >>>>> You should adjust the yarn.container.count to increase the
> >> >>> >> >>>>>parallelism
> >> >>> >> >>>>> of
> >> >>> >> >>>>> your job. By default, you get one container, but you can
> >> >>>adjust
> >> >>> >>this
> >> >>> >> >>>>> up to
> >> >>> >> >>>>> the total number of input partitions that you have. Have a
> >> >>>look
> >> >>> >>here
> >> >>> >> >>>>> for
> >> >>> >> >>>>> some details about how Samza's parallelism works:
> >> >>> >> >>>>>
> >> >>> >> >>>>>
> >> >>> >> >>>>>
> >> >>> >> >>>>>
> >> >>> >> >>>>>
> >> >>> >>
> >>
> >>>>>
> http://samza.incubator.apache.org/learn/documentation/0.7.0/introducti
> >> >>> >> >>>>>on
> >> >>> >> >>>>> /co
> >> >>> >> >>>>> ncepts.html
> >> >>> >> >>>>>
> >> >>> >> >>>>>
> >> >>> >> >>>>><
> >> >>> >>
> >> >>>
> http://samza.incubator.apache.org/learn/documentation/0.7.0/introduct
> >> >>> >> >>>>>io
> >> >>> >> >>>>> n/concepts.html>
> >> >>> >> >>>>>
> >> >>> >> >>>>>
> >> >>> >> >>>>>
> >> >>> >> >>>>>
> >> >>> >> >>>>> Cheers,
> >> >>> >> >>>>> Chris
> >> >>> >> >>>>>
> >> >>> >> >>>>> On 8/13/14 9:37 AM, "Telles Nobrega"
> >><[email protected]
> >> >
> >> >>> >> wrote:
> >> >>> >> >>>>>
> >> >>> >> >>>>>> Should I use many kafka brokers or one will sufice?
> >> >>> >> >>>>>>
> >> >>> >> >>>>>> Thanks
> >> >>> >> >>>>>>
> >> >>> >> >>>>>>
> >> >>> >> >>>>>> On Wed, Aug 13, 2014 at 7:24 AM, Telles Nobrega
> >> >>> >> >>>>> <[email protected]
> >> >>> >> >>>>>>
> >> >>> >> >>>>>> wrote:
> >> >>> >> >>>>>>
> >> >>> >> >>>>>>> It could be just one job, but what is the best way to
> >>deploy
> >> >>> >>many
> >> >>> >> >>>>>>> instances of this job so I could process a heavy load of
> >> >>> >>messages?
> >> >>> >> >>>>>>>
> >> >>> >> >>>>>>> Thanks,
> >> >>> >> >>>>>>>
> >> >>> >> >>>>>>> On 13 Aug 2014, at 01:39, Yan Fang <[email protected]
> >
> >> >>> wrote:
> >> >>> >> >>>>>>>
> >> >>> >> >>>>>>>> *"Does one kafka-broker handle this much messages per
> >> >>> second?"*
> >> >>> >> >>>>>>>>
> >> >>> >> >>>>>>>> I believe @Chris has better answer about this.
> >> >>> >> >>>>>>>>
> >> >>> >> >>>>>>>>
> >> >>> >> >>>>>>>>
> >> >>> >> >>>>>>>> *"I have one job that get this messages and another that
> >> >>>reads
> >> >>> >> >>>>> from
> >> >>> >> >>>>>>> the
> >> >>> >> >>>>>>>> output of the first job that does some more
> >>processing."*
> >> >>> >> >>>>>>>>
> >> >>> >> >>>>>>>>   Why not use one job get messages and process them?
> >> >>> >> >>>>>>>>
> >> >>> >> >>>>>>>> *" when I change a*
> >> >>> >> >>>>>>>>
> >> >>> >> >>>>>>>> *configuration of one my jobs do I need to recompile it
> >>and
> >> >>> >>send
> >> >>> >> >>>>> the
> >> >>> >> >>>>>>> new
> >> >>> >> >>>>>>>> tar.gz to hdfs or just change the deploy/samza config
> >>and
> >> >>>it
> >> >>> >> >>>>> should
> >> >>> >> >>>>>>> work."*
> >> >>> >> >>>>>>>>
> >> >>> >> >>>>>>>>   No, you don't need to recompile. Change the config and
> >> >>> >> >>>>> run-job. It
> >> >>> >> >>>>>>> will
> >> >>> >> >>>>>>>> work.
> >> >>> >> >>>>>>>>
> >> >>> >> >>>>>>>> Thanks.
> >> >>> >> >>>>>>>>
> >> >>> >> >>>>>>>> Cheers,
> >> >>> >> >>>>>>>>
> >> >>> >> >>>>>>>> Fang, Yan
> >> >>> >> >>>>>>>> [email protected]
> >> >>> >> >>>>>>>> +1 (206) 849-4108
> >> >>> >> >>>>>>>>
> >> >>> >> >>>>>>>>
> >> >>> >> >>>>>>>> On Tue, Aug 12, 2014 at 8:47 PM, Telles Nobrega
> >> >>> >> >>>>>>> <[email protected]
> >> >>> >> >>>>>>>>
> >> >>> >> >>>>>>>> wrote:
> >> >>> >> >>>>>>>>
> >> >>> >> >>>>>>>>> Not completely related to the topic of the question but
> >> >>>when
> >> >>> I
> >> >>> >> >>>>>>> change a
> >> >>> >> >>>>>>>>> configuration of one my jobs do I need to recompile it
> >>and
> >> >>> >>send
> >> >>> >> >>>>> the
> >> >>> >> >>>>>>> new
> >> >>> >> >>>>>>>>> tar.gz to hdfs or just change the deploy/samza config
> >>and
> >> >>>it
> >> >>> >> >>>>> should
> >> >>> >> >>>>>>> work.
> >> >>> >> >>>>>>>>>
> >> >>> >> >>>>>>>>> Thanks
> >> >>> >> >>>>>>>>>
> >> >>> >> >>>>>>>>>
> >> >>> >> >>>>>>>>> On Tue, Aug 12, 2014 at 11:23 PM, Telles Nobrega <
> >> >>> >> >>>>>>> [email protected]>
> >> >>> >> >>>>>>>>> wrote:
> >> >>> >> >>>>>>>>>
> >> >>> >> >>>>>>>>>> Hi, I'm running an experiment that I'm suppose to run
> >> >>>samza
> >> >>> >>with
> >> >>> >> >>>>>>>>> different
> >> >>> >> >>>>>>>>>> input rates. First I'm running with 420
> >>messages/second
> >> >>>and
> >> >>> I
> >> >>> >> >>>>> scale
> >> >>> >> >>>>>>> up
> >> >>> >> >>>>>>> to
> >> >>> >> >>>>>>>>>> 33200 messages/second.
> >> >>> >> >>>>>>>>>>
> >> >>> >> >>>>>>>>>> Does one kafka-broker handle this much messages per
> >> >>>second?
> >> >>> >> >>>>>>>>>> Second, what is the best way to read into samza this
> >>much
> >> >>> >> >>>>> messages?
> >> >>> >> >>>>>>> I
> >> >>> >> >>>>>>>>> have
> >> >>> >> >>>>>>>>>> one job that get this messages and another that reads
> >> >>>from
> >> >>> >>the
> >> >>> >> >>>>>>> output
> >> >>> >> >>>>>>> of
> >> >>> >> >>>>>>>>>> the first job that does some more processing. Is the
> >>best
> >> >>> >>way to
> >> >>> >> >>>>> use
> >> >>> >> >>>>>>> more
> >> >>> >> >>>>>>>>>> containers and split kafka topics in partitions (the
> >>same
> >> >>> >> >>>>> number of
> >> >>> >> >>>>>>>>>> containers) or is there a better way to do this.
> >> >>> >> >>>>>>>>>>
> >> >>> >> >>>>>>>>>> Thanks in advance,
> >> >>> >> >>>>>>>>>>
> >> >>> >> >>>>>>>>>> --
> >> >>> >> >>>>>>>>>> ------------------------------------------
> >> >>> >> >>>>>>>>>> Telles Mota Vidal Nobrega
> >> >>> >> >>>>>>>>>> M.sc. Candidate at UFCG
> >> >>> >> >>>>>>>>>> B.sc. in Computer Science at UFCG
> >> >>> >> >>>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
> >> >>> >> >>>>>>>>>>
> >> >>> >> >>>>>>>>>
> >> >>> >> >>>>>>>>>
> >> >>> >> >>>>>>>>>
> >> >>> >> >>>>>>>>> --
> >> >>> >> >>>>>>>>> ------------------------------------------
> >> >>> >> >>>>>>>>> Telles Mota Vidal Nobrega
> >> >>> >> >>>>>>>>> M.sc. Candidate at UFCG
> >> >>> >> >>>>>>>>> B.sc. in Computer Science at UFCG
> >> >>> >> >>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
> >> >>> >> >>>>>>>>>
> >> >>> >> >>>>>>>
> >> >>> >> >>>>>>>
> >> >>> >> >>>>>>
> >> >>> >> >>>>>>
> >> >>> >> >>>>>> --
> >> >>> >> >>>>>> ------------------------------------------
> >> >>> >> >>>>>> Telles Mota Vidal Nobrega
> >> >>> >> >>>>>> M.sc. Candidate at UFCG
> >> >>> >> >>>>>> B.sc. in Computer Science at UFCG
> >> >>> >> >>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
> >> >>> >> >>>>>
> >> >>> >> >>>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>> --
> >> >>> >> >>>> ------------------------------------------
> >> >>> >> >>>> Telles Mota Vidal Nobrega
> >> >>> >> >>>> M.sc. Candidate at UFCG
> >> >>> >> >>>> B.sc. in Computer Science at UFCG
> >> >>> >> >>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
> >> >>> >> >>>>
> >> >>> >> >>>
> >> >>> >> >>>
> >> >>> >> >>>
> >> >>> >> >>> --
> >> >>> >> >>> ------------------------------------------
> >> >>> >> >>> Telles Mota Vidal Nobrega
> >> >>> >> >>> M.sc. Candidate at UFCG
> >> >>> >> >>> B.sc. in Computer Science at UFCG
> >> >>> >> >>> Software Engineer at OpenStack Project - HP/LSD-UFCG
> >> >>> >> >
> >> >>> >>
> >> >>> >>
> >> >>> >
> >> >>> >
> >> >>> >--
> >> >>> >------------------------------------------
> >> >>> >Telles Mota Vidal Nobrega
> >> >>> >M.sc. Candidate at UFCG
> >> >>> >B.sc. in Computer Science at UFCG
> >> >>> >Software Engineer at OpenStack Project - HP/LSD-UFCG
> >> >>>
> >> >>>
> >> >>
> >> >>
> >> >> --
> >> >> ------------------------------------------
> >> >> Telles Mota Vidal Nobrega
> >> >> M.sc. Candidate at UFCG
> >> >> B.sc. in Computer Science at UFCG
> >> >> Software Engineer at OpenStack Project - HP/LSD-UFCG
> >> >>
> >> >
> >> >
> >> >
> >> >--
> >> >------------------------------------------
> >> >Telles Mota Vidal Nobrega
> >> >M.sc. Candidate at UFCG
> >> >B.sc. in Computer Science at UFCG
> >> >Software Engineer at OpenStack Project - HP/LSD-UFCG
> >>
> >>
> >
> >
> >--
> >------------------------------------------
> >Telles Mota Vidal Nobrega
> >M.sc. Candidate at UFCG
> >B.sc. in Computer Science at UFCG
> >Software Engineer at OpenStack Project - HP/LSD-UFCG
>
>


-- 
------------------------------------------
Telles Mota Vidal Nobrega
M.sc. Candidate at UFCG
B.sc. in Computer Science at UFCG
Software Engineer at OpenStack Project - HP/LSD-UFCG

Re: How to run multiple instances of the same job

Reply via email to