I was able to fix this problem, now I’m having another one. I’m using a script that starts kafka, deploys samza jobs, stop them, kills kafka and delete configurations in zookeeper and kafka-log files. Them start over again. I see that sometimes jobs don’t start running, they stay in accepted state with info LOCALIZING, what can be the cause for that?
Thanks. On 15 Aug 2014, at 19:18, Chris Riccomini <[email protected]> wrote: > Hey Telles, > > If you set yarn.container.count to 5, you should get 5 containers. The two > cases where you don't are: > > 1. The grid is at capacity, and doesn't have the memory to fulfill all > container requests. > 2. You set yarn.container.count higher than the number of partitions that > your input stream has. > > Cheers, > Chris > > On 8/15/14 1:56 PM, "Telles Nobrega" <[email protected]> wrote: > >> Hi Chris, >> >> I started playing with the yarn.container.count and set it to 5. >> >> At first I thought I had to compile the package again and republish to >> hdfs >> because I couldn't run 5 containers. >> Then I recompiled but I still only got 3 containers, is that normal >> behaviour? >> >> Thanks. >> >> >> On Wed, Aug 13, 2014 at 5:00 PM, Telles Nobrega <[email protected]> >> wrote: >> >>> Thanks Chris, i will take a look at this links and I will come back if I >>> have more questions. >>> >>> >>> On Wed, Aug 13, 2014 at 4:33 PM, Chris Riccomini < >>> [email protected]> wrote: >>> >>>> Hey Telles, >>>> >>>>>> Should I use many kafka brokers or one will suffice? >>>> >>>> The number of brokers you use is dependent on the number of >>>> messages/sec >>>> you're going to receive, the size of those messages, and how long >>>> you're >>>> going to retain them. >>>> >>>> Here is a good blog post on Kafka performance that should give you some >>>> idea of the numbers: >>>> >>>> >>>> >>>> >>>> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-milli >>>> on- >>>> writes-second-three-cheap-machines >>>> >>>> <https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mill >>>> ion-writes-second-three-cheap-machines> >>>> >>>>>> It could be just one job, but what is the best way to deploy many >>>>>> instances of this job so I could process a heavy load of messages? >>>> >>>> You should adjust the yarn.container.count to increase the parallelism >>>> of >>>> your job. By default, you get one container, but you can adjust this >>>> up to >>>> the total number of input partitions that you have. Have a look here >>>> for >>>> some details about how Samza's parallelism works: >>>> >>>> >>>> >>>> http://samza.incubator.apache.org/learn/documentation/0.7.0/introduction >>>> /co >>>> ncepts.html >>>> >>>> <http://samza.incubator.apache.org/learn/documentation/0.7.0/introductio >>>> n/concepts.html> >>>> >>>> >>>> >>>> >>>> Cheers, >>>> Chris >>>> >>>> On 8/13/14 9:37 AM, "Telles Nobrega" <[email protected]> wrote: >>>> >>>>> Should I use many kafka brokers or one will sufice? >>>>> >>>>> Thanks >>>>> >>>>> >>>>> On Wed, Aug 13, 2014 at 7:24 AM, Telles Nobrega >>>> <[email protected] >>>>> >>>>> wrote: >>>>> >>>>>> It could be just one job, but what is the best way to deploy many >>>>>> instances of this job so I could process a heavy load of messages? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> On 13 Aug 2014, at 01:39, Yan Fang <[email protected]> wrote: >>>>>> >>>>>>> *"Does one kafka-broker handle this much messages per second?"* >>>>>>> >>>>>>> I believe @Chris has better answer about this. >>>>>>> >>>>>>> >>>>>>> >>>>>>> *"I have one job that get this messages and another that reads >>>> from >>>>>> the >>>>>>> output of the first job that does some more processing."* >>>>>>> >>>>>>> Why not use one job get messages and process them? >>>>>>> >>>>>>> *" when I change a* >>>>>>> >>>>>>> *configuration of one my jobs do I need to recompile it and send >>>> the >>>>>> new >>>>>>> tar.gz to hdfs or just change the deploy/samza config and it >>>> should >>>>>> work."* >>>>>>> >>>>>>> No, you don't need to recompile. Change the config and >>>> run-job. It >>>>>> will >>>>>>> work. >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Fang, Yan >>>>>>> [email protected] >>>>>>> +1 (206) 849-4108 >>>>>>> >>>>>>> >>>>>>> On Tue, Aug 12, 2014 at 8:47 PM, Telles Nobrega >>>>>> <[email protected] >>>>>>> >>>>>>> wrote: >>>>>>> >>>>>>>> Not completely related to the topic of the question but when I >>>>>> change a >>>>>>>> configuration of one my jobs do I need to recompile it and send >>>> the >>>>>> new >>>>>>>> tar.gz to hdfs or just change the deploy/samza config and it >>>> should >>>>>> work. >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Aug 12, 2014 at 11:23 PM, Telles Nobrega < >>>>>> [email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, I'm running an experiment that I'm suppose to run samza with >>>>>>>> different >>>>>>>>> input rates. First I'm running with 420 messages/second and I >>>> scale >>>>>> up >>>>>> to >>>>>>>>> 33200 messages/second. >>>>>>>>> >>>>>>>>> Does one kafka-broker handle this much messages per second? >>>>>>>>> Second, what is the best way to read into samza this much >>>> messages? >>>>>> I >>>>>>>> have >>>>>>>>> one job that get this messages and another that reads from the >>>>>> output >>>>>> of >>>>>>>>> the first job that does some more processing. Is the best way to >>>> use >>>>>> more >>>>>>>>> containers and split kafka topics in partitions (the same >>>> number of >>>>>>>>> containers) or is there a better way to do this. >>>>>>>>> >>>>>>>>> Thanks in advance, >>>>>>>>> >>>>>>>>> -- >>>>>>>>> ------------------------------------------ >>>>>>>>> Telles Mota Vidal Nobrega >>>>>>>>> M.sc. Candidate at UFCG >>>>>>>>> B.sc. in Computer Science at UFCG >>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ------------------------------------------ >>>>>>>> Telles Mota Vidal Nobrega >>>>>>>> M.sc. Candidate at UFCG >>>>>>>> B.sc. in Computer Science at UFCG >>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG >>>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> ------------------------------------------ >>>>> Telles Mota Vidal Nobrega >>>>> M.sc. Candidate at UFCG >>>>> B.sc. in Computer Science at UFCG >>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG >>>> >>>> >>> >>> >>> -- >>> ------------------------------------------ >>> Telles Mota Vidal Nobrega >>> M.sc. Candidate at UFCG >>> B.sc. in Computer Science at UFCG >>> Software Engineer at OpenStack Project - HP/LSD-UFCG >>> >> >> >> >> -- >> ------------------------------------------ >> Telles Mota Vidal Nobrega >> M.sc. Candidate at UFCG >> B.sc. in Computer Science at UFCG >> Software Engineer at OpenStack Project - HP/LSD-UFCG
