Chris, I'm using HDFS, I will run again and see if the problem happens and I will post if i find any problem or have more questions.
Thanks. On Mon, Aug 18, 2014 at 12:45 PM, Chris Riccomini < [email protected]> wrote: > Hey Telles, > > Usually, when a job is stuck in LOCALIZING, it means that YARN is > struggling to distribute your binary (the .tgz) to the appropriate > NodeManagers, I think. You should check your NM logs and see if there are > any hints about what's going on there. > > I've seen this in the past when the NM hangs trying to download a .tgz > from the HTTP server for some reason. > > Cheers, > Chris > > On 8/16/14 10:41 PM, "Telles Nobrega" <[email protected]> wrote: > > >I was able to fix this problem, now I¹m having another one. I¹m using a > >script that starts kafka, deploys samza jobs, stop them, kills kafka and > >delete configurations in zookeeper and kafka-log files. Them start over > >again. I see that sometimes jobs don¹t start running, they stay in > >accepted state with info LOCALIZING, what can be the cause for that? > > > >Thanks. > >On 15 Aug 2014, at 19:18, Chris Riccomini > ><[email protected]> wrote: > > > >> Hey Telles, > >> > >> If you set yarn.container.count to 5, you should get 5 containers. The > >>two > >> cases where you don't are: > >> > >> 1. The grid is at capacity, and doesn't have the memory to fulfill all > >> container requests. > >> 2. You set yarn.container.count higher than the number of partitions > >>that > >> your input stream has. > >> > >> Cheers, > >> Chris > >> > >> On 8/15/14 1:56 PM, "Telles Nobrega" <[email protected]> wrote: > >> > >>> Hi Chris, > >>> > >>> I started playing with the yarn.container.count and set it to 5. > >>> > >>> At first I thought I had to compile the package again and republish to > >>> hdfs > >>> because I couldn't run 5 containers. > >>> Then I recompiled but I still only got 3 containers, is that normal > >>> behaviour? > >>> > >>> Thanks. > >>> > >>> > >>> On Wed, Aug 13, 2014 at 5:00 PM, Telles Nobrega > >>><[email protected]> > >>> wrote: > >>> > >>>> Thanks Chris, i will take a look at this links and I will come back > >>>>if I > >>>> have more questions. > >>>> > >>>> > >>>> On Wed, Aug 13, 2014 at 4:33 PM, Chris Riccomini < > >>>> [email protected]> wrote: > >>>> > >>>>> Hey Telles, > >>>>> > >>>>>>> Should I use many kafka brokers or one will suffice? > >>>>> > >>>>> The number of brokers you use is dependent on the number of > >>>>> messages/sec > >>>>> you're going to receive, the size of those messages, and how long > >>>>> you're > >>>>> going to retain them. > >>>>> > >>>>> Here is a good blog post on Kafka performance that should give you > >>>>>some > >>>>> idea of the numbers: > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mil > >>>>>li > >>>>> on- > >>>>> writes-second-three-cheap-machines > >>>>> > >>>>> > >>>>>< > https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mi > >>>>>ll > >>>>> ion-writes-second-three-cheap-machines> > >>>>> > >>>>>>> It could be just one job, but what is the best way to deploy many > >>>>>>> instances of this job so I could process a heavy load of messages? > >>>>> > >>>>> You should adjust the yarn.container.count to increase the > >>>>>parallelism > >>>>> of > >>>>> your job. By default, you get one container, but you can adjust this > >>>>> up to > >>>>> the total number of input partitions that you have. Have a look here > >>>>> for > >>>>> some details about how Samza's parallelism works: > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > http://samza.incubator.apache.org/learn/documentation/0.7.0/introducti > >>>>>on > >>>>> /co > >>>>> ncepts.html > >>>>> > >>>>> > >>>>>< > http://samza.incubator.apache.org/learn/documentation/0.7.0/introduct > >>>>>io > >>>>> n/concepts.html> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> Cheers, > >>>>> Chris > >>>>> > >>>>> On 8/13/14 9:37 AM, "Telles Nobrega" <[email protected]> > wrote: > >>>>> > >>>>>> Should I use many kafka brokers or one will sufice? > >>>>>> > >>>>>> Thanks > >>>>>> > >>>>>> > >>>>>> On Wed, Aug 13, 2014 at 7:24 AM, Telles Nobrega > >>>>> <[email protected] > >>>>>> > >>>>>> wrote: > >>>>>> > >>>>>>> It could be just one job, but what is the best way to deploy many > >>>>>>> instances of this job so I could process a heavy load of messages? > >>>>>>> > >>>>>>> Thanks, > >>>>>>> > >>>>>>> On 13 Aug 2014, at 01:39, Yan Fang <[email protected]> wrote: > >>>>>>> > >>>>>>>> *"Does one kafka-broker handle this much messages per second?"* > >>>>>>>> > >>>>>>>> I believe @Chris has better answer about this. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> *"I have one job that get this messages and another that reads > >>>>> from > >>>>>>> the > >>>>>>>> output of the first job that does some more processing."* > >>>>>>>> > >>>>>>>> Why not use one job get messages and process them? > >>>>>>>> > >>>>>>>> *" when I change a* > >>>>>>>> > >>>>>>>> *configuration of one my jobs do I need to recompile it and send > >>>>> the > >>>>>>> new > >>>>>>>> tar.gz to hdfs or just change the deploy/samza config and it > >>>>> should > >>>>>>> work."* > >>>>>>>> > >>>>>>>> No, you don't need to recompile. Change the config and > >>>>> run-job. It > >>>>>>> will > >>>>>>>> work. > >>>>>>>> > >>>>>>>> Thanks. > >>>>>>>> > >>>>>>>> Cheers, > >>>>>>>> > >>>>>>>> Fang, Yan > >>>>>>>> [email protected] > >>>>>>>> +1 (206) 849-4108 > >>>>>>>> > >>>>>>>> > >>>>>>>> On Tue, Aug 12, 2014 at 8:47 PM, Telles Nobrega > >>>>>>> <[email protected] > >>>>>>>> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Not completely related to the topic of the question but when I > >>>>>>> change a > >>>>>>>>> configuration of one my jobs do I need to recompile it and send > >>>>> the > >>>>>>> new > >>>>>>>>> tar.gz to hdfs or just change the deploy/samza config and it > >>>>> should > >>>>>>> work. > >>>>>>>>> > >>>>>>>>> Thanks > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Tue, Aug 12, 2014 at 11:23 PM, Telles Nobrega < > >>>>>>> [email protected]> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Hi, I'm running an experiment that I'm suppose to run samza with > >>>>>>>>> different > >>>>>>>>>> input rates. First I'm running with 420 messages/second and I > >>>>> scale > >>>>>>> up > >>>>>>> to > >>>>>>>>>> 33200 messages/second. > >>>>>>>>>> > >>>>>>>>>> Does one kafka-broker handle this much messages per second? > >>>>>>>>>> Second, what is the best way to read into samza this much > >>>>> messages? > >>>>>>> I > >>>>>>>>> have > >>>>>>>>>> one job that get this messages and another that reads from the > >>>>>>> output > >>>>>>> of > >>>>>>>>>> the first job that does some more processing. Is the best way to > >>>>> use > >>>>>>> more > >>>>>>>>>> containers and split kafka topics in partitions (the same > >>>>> number of > >>>>>>>>>> containers) or is there a better way to do this. > >>>>>>>>>> > >>>>>>>>>> Thanks in advance, > >>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>>> ------------------------------------------ > >>>>>>>>>> Telles Mota Vidal Nobrega > >>>>>>>>>> M.sc. Candidate at UFCG > >>>>>>>>>> B.sc. in Computer Science at UFCG > >>>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> ------------------------------------------ > >>>>>>>>> Telles Mota Vidal Nobrega > >>>>>>>>> M.sc. Candidate at UFCG > >>>>>>>>> B.sc. in Computer Science at UFCG > >>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG > >>>>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> ------------------------------------------ > >>>>>> Telles Mota Vidal Nobrega > >>>>>> M.sc. Candidate at UFCG > >>>>>> B.sc. in Computer Science at UFCG > >>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG > >>>>> > >>>>> > >>>> > >>>> > >>>> -- > >>>> ------------------------------------------ > >>>> Telles Mota Vidal Nobrega > >>>> M.sc. Candidate at UFCG > >>>> B.sc. in Computer Science at UFCG > >>>> Software Engineer at OpenStack Project - HP/LSD-UFCG > >>>> > >>> > >>> > >>> > >>> -- > >>> ------------------------------------------ > >>> Telles Mota Vidal Nobrega > >>> M.sc. Candidate at UFCG > >>> B.sc. in Computer Science at UFCG > >>> Software Engineer at OpenStack Project - HP/LSD-UFCG > > > > -- ------------------------------------------ Telles Mota Vidal Nobrega M.sc. Candidate at UFCG B.sc. in Computer Science at UFCG Software Engineer at OpenStack Project - HP/LSD-UFCG
