Thanks. So I need to send lots of messages to kafka, I'm using a producer that connects to kafka to send it. SO increase this number I'm using many producers, but seems like kafka is not accepting them all. Is there a way to work around this? I need some like 30000 messages per second.
Thanks On Wed, Aug 20, 2014 at 6:47 PM, Chris Riccomini < [email protected]> wrote: > Hey Telles, > > The Samza job can be configured to disable batching and use sync sends: > > systems.kafka.producer.producer.type=sync > systems.kafka.producer.batch.num.messages=1 > > This is how the hello-samza job works. :) > > > Note that it will dramatically affect your throughput, but if you're doing > this, you probably have a low throughput topic anyway. > > Cheers, > Chris > > On 8/20/14 1:21 PM, "Telles Nobrega" <[email protected]> wrote: > > >Chris, is there a way to eliminate completely buffering in samza + kafka? > > > > > >On Mon, Aug 18, 2014 at 1:46 PM, Telles Nobrega <[email protected]> > >wrote: > > > >> I see. Thanks. Weird thing is it works some rounds and than stops. > >> > >> > >> On Mon, Aug 18, 2014 at 1:44 PM, Chris Riccomini < > >> [email protected]> wrote: > >> > >>> Hey Telles, > >>> > >>> The problem could occur with HDFS. I believe that LOCALIZING just means > >>> that the NM is trying to download the artifact from wherever it is (be > >>> that HTTP, HDFS, etc). > >>> > >>> Cheers, > >>> Chris > >>> > >>> On 8/18/14 9:22 AM, "Telles Nobrega" <[email protected]> wrote: > >>> > >>> >Chris, > >>> > > >>> >I'm using HDFS, I will run again and see if the problem happens and I > >>> will > >>> >post if i find any problem or have more questions. > >>> > > >>> >Thanks. > >>> > > >>> > > >>> >On Mon, Aug 18, 2014 at 12:45 PM, Chris Riccomini < > >>> >[email protected]> wrote: > >>> > > >>> >> Hey Telles, > >>> >> > >>> >> Usually, when a job is stuck in LOCALIZING, it means that YARN is > >>> >> struggling to distribute your binary (the .tgz) to the appropriate > >>> >> NodeManagers, I think. You should check your NM logs and see if > >>>there > >>> >>are > >>> >> any hints about what's going on there. > >>> >> > >>> >> I've seen this in the past when the NM hangs trying to download a > >>>.tgz > >>> >> from the HTTP server for some reason. > >>> >> > >>> >> Cheers, > >>> >> Chris > >>> >> > >>> >> On 8/16/14 10:41 PM, "Telles Nobrega" <[email protected]> > >>>wrote: > >>> >> > >>> >> >I was able to fix this problem, now I¹m having another one. I¹m > >>>using > >>> a > >>> >> >script that starts kafka, deploys samza jobs, stop them, kills > >>>kafka > >>> >>and > >>> >> >delete configurations in zookeeper and kafka-log files. Them start > >>> over > >>> >> >again. I see that sometimes jobs don¹t start running, they stay in > >>> >> >accepted state with info LOCALIZING, what can be the cause for > >>>that? > >>> >> > > >>> >> >Thanks. > >>> >> >On 15 Aug 2014, at 19:18, Chris Riccomini > >>> >> ><[email protected]> wrote: > >>> >> > > >>> >> >> Hey Telles, > >>> >> >> > >>> >> >> If you set yarn.container.count to 5, you should get 5 > >>>containers. > >>> >>The > >>> >> >>two > >>> >> >> cases where you don't are: > >>> >> >> > >>> >> >> 1. The grid is at capacity, and doesn't have the memory to > >>>fulfill > >>> >>all > >>> >> >> container requests. > >>> >> >> 2. You set yarn.container.count higher than the number of > >>>partitions > >>> >> >>that > >>> >> >> your input stream has. > >>> >> >> > >>> >> >> Cheers, > >>> >> >> Chris > >>> >> >> > >>> >> >> On 8/15/14 1:56 PM, "Telles Nobrega" <[email protected]> > >>> wrote: > >>> >> >> > >>> >> >>> Hi Chris, > >>> >> >>> > >>> >> >>> I started playing with the yarn.container.count and set it to 5. > >>> >> >>> > >>> >> >>> At first I thought I had to compile the package again and > >>>republish > >>> >>to > >>> >> >>> hdfs > >>> >> >>> because I couldn't run 5 containers. > >>> >> >>> Then I recompiled but I still only got 3 containers, is that > >>>normal > >>> >> >>> behaviour? > >>> >> >>> > >>> >> >>> Thanks. > >>> >> >>> > >>> >> >>> > >>> >> >>> On Wed, Aug 13, 2014 at 5:00 PM, Telles Nobrega > >>> >> >>><[email protected]> > >>> >> >>> wrote: > >>> >> >>> > >>> >> >>>> Thanks Chris, i will take a look at this links and I will come > >>> back > >>> >> >>>>if I > >>> >> >>>> have more questions. > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> On Wed, Aug 13, 2014 at 4:33 PM, Chris Riccomini < > >>> >> >>>> [email protected]> wrote: > >>> >> >>>> > >>> >> >>>>> Hey Telles, > >>> >> >>>>> > >>> >> >>>>>>> Should I use many kafka brokers or one will suffice? > >>> >> >>>>> > >>> >> >>>>> The number of brokers you use is dependent on the number of > >>> >> >>>>> messages/sec > >>> >> >>>>> you're going to receive, the size of those messages, and how > >>>long > >>> >> >>>>> you're > >>> >> >>>>> going to retain them. > >>> >> >>>>> > >>> >> >>>>> Here is a good blog post on Kafka performance that should give > >>> you > >>> >> >>>>>some > >>> >> >>>>> idea of the numbers: > >>> >> >>>>> > >>> >> >>>>> > >>> >> >>>>> > >>> >> >>>>> > >>> >> >>>>> > >>> >> >>>>> > >>> >> > >>>https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mil > >>> >> >>>>>li > >>> >> >>>>> on- > >>> >> >>>>> writes-second-three-cheap-machines > >>> >> >>>>> > >>> >> >>>>> > >>> >> >>>>>< > >>> >> > >>>https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mi > >>> >> >>>>>ll > >>> >> >>>>> ion-writes-second-three-cheap-machines> > >>> >> >>>>> > >>> >> >>>>>>> It could be just one job, but what is the best way to deploy > >>> >>many > >>> >> >>>>>>> instances of this job so I could process a heavy load of > >>> >>messages? > >>> >> >>>>> > >>> >> >>>>> You should adjust the yarn.container.count to increase the > >>> >> >>>>>parallelism > >>> >> >>>>> of > >>> >> >>>>> your job. By default, you get one container, but you can > >>>adjust > >>> >>this > >>> >> >>>>> up to > >>> >> >>>>> the total number of input partitions that you have. Have a > >>>look > >>> >>here > >>> >> >>>>> for > >>> >> >>>>> some details about how Samza's parallelism works: > >>> >> >>>>> > >>> >> >>>>> > >>> >> >>>>> > >>> >> >>>>> > >>> >> >>>>> > >>> >> > >>>http://samza.incubator.apache.org/learn/documentation/0.7.0/introducti > >>> >> >>>>>on > >>> >> >>>>> /co > >>> >> >>>>> ncepts.html > >>> >> >>>>> > >>> >> >>>>> > >>> >> >>>>>< > >>> >> > >>>http://samza.incubator.apache.org/learn/documentation/0.7.0/introduct > >>> >> >>>>>io > >>> >> >>>>> n/concepts.html> > >>> >> >>>>> > >>> >> >>>>> > >>> >> >>>>> > >>> >> >>>>> > >>> >> >>>>> Cheers, > >>> >> >>>>> Chris > >>> >> >>>>> > >>> >> >>>>> On 8/13/14 9:37 AM, "Telles Nobrega" <[email protected] > > > >>> >> wrote: > >>> >> >>>>> > >>> >> >>>>>> Should I use many kafka brokers or one will sufice? > >>> >> >>>>>> > >>> >> >>>>>> Thanks > >>> >> >>>>>> > >>> >> >>>>>> > >>> >> >>>>>> On Wed, Aug 13, 2014 at 7:24 AM, Telles Nobrega > >>> >> >>>>> <[email protected] > >>> >> >>>>>> > >>> >> >>>>>> wrote: > >>> >> >>>>>> > >>> >> >>>>>>> It could be just one job, but what is the best way to deploy > >>> >>many > >>> >> >>>>>>> instances of this job so I could process a heavy load of > >>> >>messages? > >>> >> >>>>>>> > >>> >> >>>>>>> Thanks, > >>> >> >>>>>>> > >>> >> >>>>>>> On 13 Aug 2014, at 01:39, Yan Fang <[email protected]> > >>> wrote: > >>> >> >>>>>>> > >>> >> >>>>>>>> *"Does one kafka-broker handle this much messages per > >>> second?"* > >>> >> >>>>>>>> > >>> >> >>>>>>>> I believe @Chris has better answer about this. > >>> >> >>>>>>>> > >>> >> >>>>>>>> > >>> >> >>>>>>>> > >>> >> >>>>>>>> *"I have one job that get this messages and another that > >>>reads > >>> >> >>>>> from > >>> >> >>>>>>> the > >>> >> >>>>>>>> output of the first job that does some more processing."* > >>> >> >>>>>>>> > >>> >> >>>>>>>> Why not use one job get messages and process them? > >>> >> >>>>>>>> > >>> >> >>>>>>>> *" when I change a* > >>> >> >>>>>>>> > >>> >> >>>>>>>> *configuration of one my jobs do I need to recompile it and > >>> >>send > >>> >> >>>>> the > >>> >> >>>>>>> new > >>> >> >>>>>>>> tar.gz to hdfs or just change the deploy/samza config and > >>>it > >>> >> >>>>> should > >>> >> >>>>>>> work."* > >>> >> >>>>>>>> > >>> >> >>>>>>>> No, you don't need to recompile. Change the config and > >>> >> >>>>> run-job. It > >>> >> >>>>>>> will > >>> >> >>>>>>>> work. > >>> >> >>>>>>>> > >>> >> >>>>>>>> Thanks. > >>> >> >>>>>>>> > >>> >> >>>>>>>> Cheers, > >>> >> >>>>>>>> > >>> >> >>>>>>>> Fang, Yan > >>> >> >>>>>>>> [email protected] > >>> >> >>>>>>>> +1 (206) 849-4108 > >>> >> >>>>>>>> > >>> >> >>>>>>>> > >>> >> >>>>>>>> On Tue, Aug 12, 2014 at 8:47 PM, Telles Nobrega > >>> >> >>>>>>> <[email protected] > >>> >> >>>>>>>> > >>> >> >>>>>>>> wrote: > >>> >> >>>>>>>> > >>> >> >>>>>>>>> Not completely related to the topic of the question but > >>>when > >>> I > >>> >> >>>>>>> change a > >>> >> >>>>>>>>> configuration of one my jobs do I need to recompile it and > >>> >>send > >>> >> >>>>> the > >>> >> >>>>>>> new > >>> >> >>>>>>>>> tar.gz to hdfs or just change the deploy/samza config and > >>>it > >>> >> >>>>> should > >>> >> >>>>>>> work. > >>> >> >>>>>>>>> > >>> >> >>>>>>>>> Thanks > >>> >> >>>>>>>>> > >>> >> >>>>>>>>> > >>> >> >>>>>>>>> On Tue, Aug 12, 2014 at 11:23 PM, Telles Nobrega < > >>> >> >>>>>>> [email protected]> > >>> >> >>>>>>>>> wrote: > >>> >> >>>>>>>>> > >>> >> >>>>>>>>>> Hi, I'm running an experiment that I'm suppose to run > >>>samza > >>> >>with > >>> >> >>>>>>>>> different > >>> >> >>>>>>>>>> input rates. First I'm running with 420 messages/second > >>>and > >>> I > >>> >> >>>>> scale > >>> >> >>>>>>> up > >>> >> >>>>>>> to > >>> >> >>>>>>>>>> 33200 messages/second. > >>> >> >>>>>>>>>> > >>> >> >>>>>>>>>> Does one kafka-broker handle this much messages per > >>>second? > >>> >> >>>>>>>>>> Second, what is the best way to read into samza this much > >>> >> >>>>> messages? > >>> >> >>>>>>> I > >>> >> >>>>>>>>> have > >>> >> >>>>>>>>>> one job that get this messages and another that reads > >>>from > >>> >>the > >>> >> >>>>>>> output > >>> >> >>>>>>> of > >>> >> >>>>>>>>>> the first job that does some more processing. Is the best > >>> >>way to > >>> >> >>>>> use > >>> >> >>>>>>> more > >>> >> >>>>>>>>>> containers and split kafka topics in partitions (the same > >>> >> >>>>> number of > >>> >> >>>>>>>>>> containers) or is there a better way to do this. > >>> >> >>>>>>>>>> > >>> >> >>>>>>>>>> Thanks in advance, > >>> >> >>>>>>>>>> > >>> >> >>>>>>>>>> -- > >>> >> >>>>>>>>>> ------------------------------------------ > >>> >> >>>>>>>>>> Telles Mota Vidal Nobrega > >>> >> >>>>>>>>>> M.sc. Candidate at UFCG > >>> >> >>>>>>>>>> B.sc. in Computer Science at UFCG > >>> >> >>>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG > >>> >> >>>>>>>>>> > >>> >> >>>>>>>>> > >>> >> >>>>>>>>> > >>> >> >>>>>>>>> > >>> >> >>>>>>>>> -- > >>> >> >>>>>>>>> ------------------------------------------ > >>> >> >>>>>>>>> Telles Mota Vidal Nobrega > >>> >> >>>>>>>>> M.sc. Candidate at UFCG > >>> >> >>>>>>>>> B.sc. in Computer Science at UFCG > >>> >> >>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG > >>> >> >>>>>>>>> > >>> >> >>>>>>> > >>> >> >>>>>>> > >>> >> >>>>>> > >>> >> >>>>>> > >>> >> >>>>>> -- > >>> >> >>>>>> ------------------------------------------ > >>> >> >>>>>> Telles Mota Vidal Nobrega > >>> >> >>>>>> M.sc. Candidate at UFCG > >>> >> >>>>>> B.sc. in Computer Science at UFCG > >>> >> >>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG > >>> >> >>>>> > >>> >> >>>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> -- > >>> >> >>>> ------------------------------------------ > >>> >> >>>> Telles Mota Vidal Nobrega > >>> >> >>>> M.sc. Candidate at UFCG > >>> >> >>>> B.sc. in Computer Science at UFCG > >>> >> >>>> Software Engineer at OpenStack Project - HP/LSD-UFCG > >>> >> >>>> > >>> >> >>> > >>> >> >>> > >>> >> >>> > >>> >> >>> -- > >>> >> >>> ------------------------------------------ > >>> >> >>> Telles Mota Vidal Nobrega > >>> >> >>> M.sc. Candidate at UFCG > >>> >> >>> B.sc. in Computer Science at UFCG > >>> >> >>> Software Engineer at OpenStack Project - HP/LSD-UFCG > >>> >> > > >>> >> > >>> >> > >>> > > >>> > > >>> >-- > >>> >------------------------------------------ > >>> >Telles Mota Vidal Nobrega > >>> >M.sc. Candidate at UFCG > >>> >B.sc. in Computer Science at UFCG > >>> >Software Engineer at OpenStack Project - HP/LSD-UFCG > >>> > >>> > >> > >> > >> -- > >> ------------------------------------------ > >> Telles Mota Vidal Nobrega > >> M.sc. Candidate at UFCG > >> B.sc. in Computer Science at UFCG > >> Software Engineer at OpenStack Project - HP/LSD-UFCG > >> > > > > > > > >-- > >------------------------------------------ > >Telles Mota Vidal Nobrega > >M.sc. Candidate at UFCG > >B.sc. in Computer Science at UFCG > >Software Engineer at OpenStack Project - HP/LSD-UFCG > > -- ------------------------------------------ Telles Mota Vidal Nobrega M.sc. Candidate at UFCG B.sc. in Computer Science at UFCG Software Engineer at OpenStack Project - HP/LSD-UFCG
