Hey Telles, The Samza job can be configured to disable batching and use sync sends:
systems.kafka.producer.producer.type=sync systems.kafka.producer.batch.num.messages=1 This is how the hello-samza job works. :) Note that it will dramatically affect your throughput, but if you're doing this, you probably have a low throughput topic anyway. Cheers, Chris On 8/20/14 1:21 PM, "Telles Nobrega" <[email protected]> wrote: >Chris, is there a way to eliminate completely buffering in samza + kafka? > > >On Mon, Aug 18, 2014 at 1:46 PM, Telles Nobrega <[email protected]> >wrote: > >> I see. Thanks. Weird thing is it works some rounds and than stops. >> >> >> On Mon, Aug 18, 2014 at 1:44 PM, Chris Riccomini < >> [email protected]> wrote: >> >>> Hey Telles, >>> >>> The problem could occur with HDFS. I believe that LOCALIZING just means >>> that the NM is trying to download the artifact from wherever it is (be >>> that HTTP, HDFS, etc). >>> >>> Cheers, >>> Chris >>> >>> On 8/18/14 9:22 AM, "Telles Nobrega" <[email protected]> wrote: >>> >>> >Chris, >>> > >>> >I'm using HDFS, I will run again and see if the problem happens and I >>> will >>> >post if i find any problem or have more questions. >>> > >>> >Thanks. >>> > >>> > >>> >On Mon, Aug 18, 2014 at 12:45 PM, Chris Riccomini < >>> >[email protected]> wrote: >>> > >>> >> Hey Telles, >>> >> >>> >> Usually, when a job is stuck in LOCALIZING, it means that YARN is >>> >> struggling to distribute your binary (the .tgz) to the appropriate >>> >> NodeManagers, I think. You should check your NM logs and see if >>>there >>> >>are >>> >> any hints about what's going on there. >>> >> >>> >> I've seen this in the past when the NM hangs trying to download a >>>.tgz >>> >> from the HTTP server for some reason. >>> >> >>> >> Cheers, >>> >> Chris >>> >> >>> >> On 8/16/14 10:41 PM, "Telles Nobrega" <[email protected]> >>>wrote: >>> >> >>> >> >I was able to fix this problem, now I¹m having another one. I¹m >>>using >>> a >>> >> >script that starts kafka, deploys samza jobs, stop them, kills >>>kafka >>> >>and >>> >> >delete configurations in zookeeper and kafka-log files. Them start >>> over >>> >> >again. I see that sometimes jobs don¹t start running, they stay in >>> >> >accepted state with info LOCALIZING, what can be the cause for >>>that? >>> >> > >>> >> >Thanks. >>> >> >On 15 Aug 2014, at 19:18, Chris Riccomini >>> >> ><[email protected]> wrote: >>> >> > >>> >> >> Hey Telles, >>> >> >> >>> >> >> If you set yarn.container.count to 5, you should get 5 >>>containers. >>> >>The >>> >> >>two >>> >> >> cases where you don't are: >>> >> >> >>> >> >> 1. The grid is at capacity, and doesn't have the memory to >>>fulfill >>> >>all >>> >> >> container requests. >>> >> >> 2. You set yarn.container.count higher than the number of >>>partitions >>> >> >>that >>> >> >> your input stream has. >>> >> >> >>> >> >> Cheers, >>> >> >> Chris >>> >> >> >>> >> >> On 8/15/14 1:56 PM, "Telles Nobrega" <[email protected]> >>> wrote: >>> >> >> >>> >> >>> Hi Chris, >>> >> >>> >>> >> >>> I started playing with the yarn.container.count and set it to 5. >>> >> >>> >>> >> >>> At first I thought I had to compile the package again and >>>republish >>> >>to >>> >> >>> hdfs >>> >> >>> because I couldn't run 5 containers. >>> >> >>> Then I recompiled but I still only got 3 containers, is that >>>normal >>> >> >>> behaviour? >>> >> >>> >>> >> >>> Thanks. >>> >> >>> >>> >> >>> >>> >> >>> On Wed, Aug 13, 2014 at 5:00 PM, Telles Nobrega >>> >> >>><[email protected]> >>> >> >>> wrote: >>> >> >>> >>> >> >>>> Thanks Chris, i will take a look at this links and I will come >>> back >>> >> >>>>if I >>> >> >>>> have more questions. >>> >> >>>> >>> >> >>>> >>> >> >>>> On Wed, Aug 13, 2014 at 4:33 PM, Chris Riccomini < >>> >> >>>> [email protected]> wrote: >>> >> >>>> >>> >> >>>>> Hey Telles, >>> >> >>>>> >>> >> >>>>>>> Should I use many kafka brokers or one will suffice? >>> >> >>>>> >>> >> >>>>> The number of brokers you use is dependent on the number of >>> >> >>>>> messages/sec >>> >> >>>>> you're going to receive, the size of those messages, and how >>>long >>> >> >>>>> you're >>> >> >>>>> going to retain them. >>> >> >>>>> >>> >> >>>>> Here is a good blog post on Kafka performance that should give >>> you >>> >> >>>>>some >>> >> >>>>> idea of the numbers: >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mil >>> >> >>>>>li >>> >> >>>>> on- >>> >> >>>>> writes-second-three-cheap-machines >>> >> >>>>> >>> >> >>>>> >>> >> >>>>>< >>> >> >>>https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mi >>> >> >>>>>ll >>> >> >>>>> ion-writes-second-three-cheap-machines> >>> >> >>>>> >>> >> >>>>>>> It could be just one job, but what is the best way to deploy >>> >>many >>> >> >>>>>>> instances of this job so I could process a heavy load of >>> >>messages? >>> >> >>>>> >>> >> >>>>> You should adjust the yarn.container.count to increase the >>> >> >>>>>parallelism >>> >> >>>>> of >>> >> >>>>> your job. By default, you get one container, but you can >>>adjust >>> >>this >>> >> >>>>> up to >>> >> >>>>> the total number of input partitions that you have. Have a >>>look >>> >>here >>> >> >>>>> for >>> >> >>>>> some details about how Samza's parallelism works: >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>http://samza.incubator.apache.org/learn/documentation/0.7.0/introducti >>> >> >>>>>on >>> >> >>>>> /co >>> >> >>>>> ncepts.html >>> >> >>>>> >>> >> >>>>> >>> >> >>>>>< >>> >> >>>http://samza.incubator.apache.org/learn/documentation/0.7.0/introduct >>> >> >>>>>io >>> >> >>>>> n/concepts.html> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> Cheers, >>> >> >>>>> Chris >>> >> >>>>> >>> >> >>>>> On 8/13/14 9:37 AM, "Telles Nobrega" <[email protected]> >>> >> wrote: >>> >> >>>>> >>> >> >>>>>> Should I use many kafka brokers or one will sufice? >>> >> >>>>>> >>> >> >>>>>> Thanks >>> >> >>>>>> >>> >> >>>>>> >>> >> >>>>>> On Wed, Aug 13, 2014 at 7:24 AM, Telles Nobrega >>> >> >>>>> <[email protected] >>> >> >>>>>> >>> >> >>>>>> wrote: >>> >> >>>>>> >>> >> >>>>>>> It could be just one job, but what is the best way to deploy >>> >>many >>> >> >>>>>>> instances of this job so I could process a heavy load of >>> >>messages? >>> >> >>>>>>> >>> >> >>>>>>> Thanks, >>> >> >>>>>>> >>> >> >>>>>>> On 13 Aug 2014, at 01:39, Yan Fang <[email protected]> >>> wrote: >>> >> >>>>>>> >>> >> >>>>>>>> *"Does one kafka-broker handle this much messages per >>> second?"* >>> >> >>>>>>>> >>> >> >>>>>>>> I believe @Chris has better answer about this. >>> >> >>>>>>>> >>> >> >>>>>>>> >>> >> >>>>>>>> >>> >> >>>>>>>> *"I have one job that get this messages and another that >>>reads >>> >> >>>>> from >>> >> >>>>>>> the >>> >> >>>>>>>> output of the first job that does some more processing."* >>> >> >>>>>>>> >>> >> >>>>>>>> Why not use one job get messages and process them? >>> >> >>>>>>>> >>> >> >>>>>>>> *" when I change a* >>> >> >>>>>>>> >>> >> >>>>>>>> *configuration of one my jobs do I need to recompile it and >>> >>send >>> >> >>>>> the >>> >> >>>>>>> new >>> >> >>>>>>>> tar.gz to hdfs or just change the deploy/samza config and >>>it >>> >> >>>>> should >>> >> >>>>>>> work."* >>> >> >>>>>>>> >>> >> >>>>>>>> No, you don't need to recompile. Change the config and >>> >> >>>>> run-job. It >>> >> >>>>>>> will >>> >> >>>>>>>> work. >>> >> >>>>>>>> >>> >> >>>>>>>> Thanks. >>> >> >>>>>>>> >>> >> >>>>>>>> Cheers, >>> >> >>>>>>>> >>> >> >>>>>>>> Fang, Yan >>> >> >>>>>>>> [email protected] >>> >> >>>>>>>> +1 (206) 849-4108 >>> >> >>>>>>>> >>> >> >>>>>>>> >>> >> >>>>>>>> On Tue, Aug 12, 2014 at 8:47 PM, Telles Nobrega >>> >> >>>>>>> <[email protected] >>> >> >>>>>>>> >>> >> >>>>>>>> wrote: >>> >> >>>>>>>> >>> >> >>>>>>>>> Not completely related to the topic of the question but >>>when >>> I >>> >> >>>>>>> change a >>> >> >>>>>>>>> configuration of one my jobs do I need to recompile it and >>> >>send >>> >> >>>>> the >>> >> >>>>>>> new >>> >> >>>>>>>>> tar.gz to hdfs or just change the deploy/samza config and >>>it >>> >> >>>>> should >>> >> >>>>>>> work. >>> >> >>>>>>>>> >>> >> >>>>>>>>> Thanks >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> On Tue, Aug 12, 2014 at 11:23 PM, Telles Nobrega < >>> >> >>>>>>> [email protected]> >>> >> >>>>>>>>> wrote: >>> >> >>>>>>>>> >>> >> >>>>>>>>>> Hi, I'm running an experiment that I'm suppose to run >>>samza >>> >>with >>> >> >>>>>>>>> different >>> >> >>>>>>>>>> input rates. First I'm running with 420 messages/second >>>and >>> I >>> >> >>>>> scale >>> >> >>>>>>> up >>> >> >>>>>>> to >>> >> >>>>>>>>>> 33200 messages/second. >>> >> >>>>>>>>>> >>> >> >>>>>>>>>> Does one kafka-broker handle this much messages per >>>second? >>> >> >>>>>>>>>> Second, what is the best way to read into samza this much >>> >> >>>>> messages? >>> >> >>>>>>> I >>> >> >>>>>>>>> have >>> >> >>>>>>>>>> one job that get this messages and another that reads >>>from >>> >>the >>> >> >>>>>>> output >>> >> >>>>>>> of >>> >> >>>>>>>>>> the first job that does some more processing. Is the best >>> >>way to >>> >> >>>>> use >>> >> >>>>>>> more >>> >> >>>>>>>>>> containers and split kafka topics in partitions (the same >>> >> >>>>> number of >>> >> >>>>>>>>>> containers) or is there a better way to do this. >>> >> >>>>>>>>>> >>> >> >>>>>>>>>> Thanks in advance, >>> >> >>>>>>>>>> >>> >> >>>>>>>>>> -- >>> >> >>>>>>>>>> ------------------------------------------ >>> >> >>>>>>>>>> Telles Mota Vidal Nobrega >>> >> >>>>>>>>>> M.sc. Candidate at UFCG >>> >> >>>>>>>>>> B.sc. in Computer Science at UFCG >>> >> >>>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG >>> >> >>>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> -- >>> >> >>>>>>>>> ------------------------------------------ >>> >> >>>>>>>>> Telles Mota Vidal Nobrega >>> >> >>>>>>>>> M.sc. Candidate at UFCG >>> >> >>>>>>>>> B.sc. in Computer Science at UFCG >>> >> >>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG >>> >> >>>>>>>>> >>> >> >>>>>>> >>> >> >>>>>>> >>> >> >>>>>> >>> >> >>>>>> >>> >> >>>>>> -- >>> >> >>>>>> ------------------------------------------ >>> >> >>>>>> Telles Mota Vidal Nobrega >>> >> >>>>>> M.sc. Candidate at UFCG >>> >> >>>>>> B.sc. in Computer Science at UFCG >>> >> >>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG >>> >> >>>>> >>> >> >>>>> >>> >> >>>> >>> >> >>>> >>> >> >>>> -- >>> >> >>>> ------------------------------------------ >>> >> >>>> Telles Mota Vidal Nobrega >>> >> >>>> M.sc. Candidate at UFCG >>> >> >>>> B.sc. in Computer Science at UFCG >>> >> >>>> Software Engineer at OpenStack Project - HP/LSD-UFCG >>> >> >>>> >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> -- >>> >> >>> ------------------------------------------ >>> >> >>> Telles Mota Vidal Nobrega >>> >> >>> M.sc. Candidate at UFCG >>> >> >>> B.sc. in Computer Science at UFCG >>> >> >>> Software Engineer at OpenStack Project - HP/LSD-UFCG >>> >> > >>> >> >>> >> >>> > >>> > >>> >-- >>> >------------------------------------------ >>> >Telles Mota Vidal Nobrega >>> >M.sc. Candidate at UFCG >>> >B.sc. in Computer Science at UFCG >>> >Software Engineer at OpenStack Project - HP/LSD-UFCG >>> >>> >> >> >> -- >> ------------------------------------------ >> Telles Mota Vidal Nobrega >> M.sc. Candidate at UFCG >> B.sc. in Computer Science at UFCG >> Software Engineer at OpenStack Project - HP/LSD-UFCG >> > > > >-- >------------------------------------------ >Telles Mota Vidal Nobrega >M.sc. Candidate at UFCG >B.sc. in Computer Science at UFCG >Software Engineer at OpenStack Project - HP/LSD-UFCG
