Re: How to run multiple instances of the same job

Telles Nobrega Mon, 18 Aug 2014 09:24:12 -0700

Chris,

I'm using HDFS, I will run again and see if the problem happens and I will
post if i find any problem or have more questions.


Thanks.


On Mon, Aug 18, 2014 at 12:45 PM, Chris Riccomini <
[email protected]> wrote:

> Hey Telles,
>
> Usually, when a job is stuck in LOCALIZING, it means that YARN is
> struggling to distribute your binary (the .tgz) to the appropriate
> NodeManagers, I think. You should check your NM logs and see if there are
> any hints about what's going on there.
>
> I've seen this in the past when the NM hangs trying to download a .tgz
> from the HTTP server for some reason.
>
> Cheers,
> Chris
>
> On 8/16/14 10:41 PM, "Telles Nobrega" <[email protected]> wrote:
>
> >I was able to fix this problem, now I¹m having another one. I¹m using a
> >script that starts kafka, deploys samza jobs, stop them, kills kafka and
> >delete configurations in zookeeper and kafka-log files. Them start over
> >again. I see that sometimes jobs don¹t start running, they stay in
> >accepted state with info LOCALIZING, what can be the cause for that?
> >
> >Thanks.
> >On 15 Aug 2014, at 19:18, Chris Riccomini
> ><[email protected]> wrote:
> >
> >> Hey Telles,
> >>
> >> If you set yarn.container.count to 5, you should get 5 containers. The
> >>two
> >> cases where you don't are:
> >>
> >> 1. The grid is at capacity, and doesn't have the memory to fulfill all
> >> container requests.
> >> 2. You set yarn.container.count higher than the number of partitions
> >>that
> >> your input stream has.
> >>
> >> Cheers,
> >> Chris
> >>
> >> On 8/15/14 1:56 PM, "Telles Nobrega" <[email protected]> wrote:
> >>
> >>> Hi Chris,
> >>>
> >>> I started playing with the yarn.container.count and set it to 5.
> >>>
> >>> At first I thought I had to compile the package again and republish to
> >>> hdfs
> >>> because I couldn't run 5 containers.
> >>> Then I recompiled but I still only got 3 containers, is that normal
> >>> behaviour?
> >>>
> >>> Thanks.
> >>>
> >>>
> >>> On Wed, Aug 13, 2014 at 5:00 PM, Telles Nobrega
> >>><[email protected]>
> >>> wrote:
> >>>
> >>>> Thanks Chris, i will take a look at this links and I will come back
> >>>>if I
> >>>> have more questions.
> >>>>
> >>>>
> >>>> On Wed, Aug 13, 2014 at 4:33 PM, Chris Riccomini <
> >>>> [email protected]> wrote:
> >>>>
> >>>>> Hey Telles,
> >>>>>
> >>>>>>> Should I use many kafka brokers or one will suffice?
> >>>>>
> >>>>> The number of brokers you use is dependent on the number of
> >>>>> messages/sec
> >>>>> you're going to receive, the size of those messages, and how long
> >>>>> you're
> >>>>> going to retain them.
> >>>>>
> >>>>> Here is a good blog post on Kafka performance that should give you
> >>>>>some
> >>>>> idea of the numbers:
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mil
> >>>>>li
> >>>>> on-
> >>>>> writes-second-three-cheap-machines
> >>>>>
> >>>>>
> >>>>><
> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mi
> >>>>>ll
> >>>>> ion-writes-second-three-cheap-machines>
> >>>>>
> >>>>>>> It could be just one job, but what is the best way to deploy many
> >>>>>>> instances of this job so I could process a heavy load of messages?
> >>>>>
> >>>>> You should adjust the yarn.container.count to increase the
> >>>>>parallelism
> >>>>> of
> >>>>> your job. By default, you get one container, but you can adjust this
> >>>>> up to
> >>>>> the total number of input partitions that you have. Have a look here
> >>>>> for
> >>>>> some details about how Samza's parallelism works:
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> http://samza.incubator.apache.org/learn/documentation/0.7.0/introducti
> >>>>>on
> >>>>> /co
> >>>>> ncepts.html
> >>>>>
> >>>>>
> >>>>><
> http://samza.incubator.apache.org/learn/documentation/0.7.0/introduct
> >>>>>io
> >>>>> n/concepts.html>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> Cheers,
> >>>>> Chris
> >>>>>
> >>>>> On 8/13/14 9:37 AM, "Telles Nobrega" <[email protected]>
> wrote:
> >>>>>
> >>>>>> Should I use many kafka brokers or one will sufice?
> >>>>>>
> >>>>>> Thanks
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Aug 13, 2014 at 7:24 AM, Telles Nobrega
> >>>>> <[email protected]
> >>>>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> It could be just one job, but what is the best way to deploy many
> >>>>>>> instances of this job so I could process a heavy load of messages?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>>
> >>>>>>> On 13 Aug 2014, at 01:39, Yan Fang <[email protected]> wrote:
> >>>>>>>
> >>>>>>>> *"Does one kafka-broker handle this much messages per second?"*
> >>>>>>>>
> >>>>>>>> I believe @Chris has better answer about this.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> *"I have one job that get this messages and another that reads
> >>>>> from
> >>>>>>> the
> >>>>>>>> output of the first job that does some more processing."*
> >>>>>>>>
> >>>>>>>>   Why not use one job get messages and process them?
> >>>>>>>>
> >>>>>>>> *" when I change a*
> >>>>>>>>
> >>>>>>>> *configuration of one my jobs do I need to recompile it and send
> >>>>> the
> >>>>>>> new
> >>>>>>>> tar.gz to hdfs or just change the deploy/samza config and it
> >>>>> should
> >>>>>>> work."*
> >>>>>>>>
> >>>>>>>>   No, you don't need to recompile. Change the config and
> >>>>> run-job. It
> >>>>>>> will
> >>>>>>>> work.
> >>>>>>>>
> >>>>>>>> Thanks.
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>>
> >>>>>>>> Fang, Yan
> >>>>>>>> [email protected]
> >>>>>>>> +1 (206) 849-4108
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Tue, Aug 12, 2014 at 8:47 PM, Telles Nobrega
> >>>>>>> <[email protected]
> >>>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Not completely related to the topic of the question but when I
> >>>>>>> change a
> >>>>>>>>> configuration of one my jobs do I need to recompile it and send
> >>>>> the
> >>>>>>> new
> >>>>>>>>> tar.gz to hdfs or just change the deploy/samza config and it
> >>>>> should
> >>>>>>> work.
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Tue, Aug 12, 2014 at 11:23 PM, Telles Nobrega <
> >>>>>>> [email protected]>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi, I'm running an experiment that I'm suppose to run samza with
> >>>>>>>>> different
> >>>>>>>>>> input rates. First I'm running with 420 messages/second and I
> >>>>> scale
> >>>>>>> up
> >>>>>>> to
> >>>>>>>>>> 33200 messages/second.
> >>>>>>>>>>
> >>>>>>>>>> Does one kafka-broker handle this much messages per second?
> >>>>>>>>>> Second, what is the best way to read into samza this much
> >>>>> messages?
> >>>>>>> I
> >>>>>>>>> have
> >>>>>>>>>> one job that get this messages and another that reads from the
> >>>>>>> output
> >>>>>>> of
> >>>>>>>>>> the first job that does some more processing. Is the best way to
> >>>>> use
> >>>>>>> more
> >>>>>>>>>> containers and split kafka topics in partitions (the same
> >>>>> number of
> >>>>>>>>>> containers) or is there a better way to do this.
> >>>>>>>>>>
> >>>>>>>>>> Thanks in advance,
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> ------------------------------------------
> >>>>>>>>>> Telles Mota Vidal Nobrega
> >>>>>>>>>> M.sc. Candidate at UFCG
> >>>>>>>>>> B.sc. in Computer Science at UFCG
> >>>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> ------------------------------------------
> >>>>>>>>> Telles Mota Vidal Nobrega
> >>>>>>>>> M.sc. Candidate at UFCG
> >>>>>>>>> B.sc. in Computer Science at UFCG
> >>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> ------------------------------------------
> >>>>>> Telles Mota Vidal Nobrega
> >>>>>> M.sc. Candidate at UFCG
> >>>>>> B.sc. in Computer Science at UFCG
> >>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> ------------------------------------------
> >>>> Telles Mota Vidal Nobrega
> >>>> M.sc. Candidate at UFCG
> >>>> B.sc. in Computer Science at UFCG
> >>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> ------------------------------------------
> >>> Telles Mota Vidal Nobrega
> >>> M.sc. Candidate at UFCG
> >>> B.sc. in Computer Science at UFCG
> >>> Software Engineer at OpenStack Project - HP/LSD-UFCG
> >
>
>


-- 
------------------------------------------
Telles Mota Vidal Nobrega
M.sc. Candidate at UFCG
B.sc. in Computer Science at UFCG
Software Engineer at OpenStack Project - HP/LSD-UFCG

Re: How to run multiple instances of the same job

Reply via email to