I was mistaken, Kafka receives them all, samza doesn't process all because I'm not using buffer.
On Fri, Aug 22, 2014 at 11:55 AM, Chris Riccomini < [email protected]> wrote: > Hey Telles, > > >> SO increase this number I'm using many producers, but seems like kafka > >>is not accepting them all. > > When you say Kafka is not accepting them, what do you mean? Kafka > generally doesn't reject messages unless the size of the message that > you're sending is too large (message.max.bytes in > http://kafka.apache.org/documentation.html#brokerconfigs). > > Cheers, > Chris > > On 8/21/14 4:05 PM, "Telles Nobrega" <[email protected]> wrote: > > >Thanks. So I need to send lots of messages to kafka, I'm using a producer > >that connects to kafka to send it. SO increase this number I'm using many > >producers, but seems like kafka is not accepting them all. Is there a way > >to work around this? I need some like 30000 messages per second. > > > >Thanks > > > > > >On Wed, Aug 20, 2014 at 6:47 PM, Chris Riccomini < > >[email protected]> wrote: > > > >> Hey Telles, > >> > >> The Samza job can be configured to disable batching and use sync sends: > >> > >> systems.kafka.producer.producer.type=sync > >> systems.kafka.producer.batch.num.messages=1 > >> > >> This is how the hello-samza job works. :) > >> > >> > >> Note that it will dramatically affect your throughput, but if you're > >>doing > >> this, you probably have a low throughput topic anyway. > >> > >> Cheers, > >> Chris > >> > >> On 8/20/14 1:21 PM, "Telles Nobrega" <[email protected]> wrote: > >> > >> >Chris, is there a way to eliminate completely buffering in samza + > >>kafka? > >> > > >> > > >> >On Mon, Aug 18, 2014 at 1:46 PM, Telles Nobrega > >><[email protected]> > >> >wrote: > >> > > >> >> I see. Thanks. Weird thing is it works some rounds and than stops. > >> >> > >> >> > >> >> On Mon, Aug 18, 2014 at 1:44 PM, Chris Riccomini < > >> >> [email protected]> wrote: > >> >> > >> >>> Hey Telles, > >> >>> > >> >>> The problem could occur with HDFS. I believe that LOCALIZING just > >>means > >> >>> that the NM is trying to download the artifact from wherever it is > >>(be > >> >>> that HTTP, HDFS, etc). > >> >>> > >> >>> Cheers, > >> >>> Chris > >> >>> > >> >>> On 8/18/14 9:22 AM, "Telles Nobrega" <[email protected]> > >>wrote: > >> >>> > >> >>> >Chris, > >> >>> > > >> >>> >I'm using HDFS, I will run again and see if the problem happens > >>and I > >> >>> will > >> >>> >post if i find any problem or have more questions. > >> >>> > > >> >>> >Thanks. > >> >>> > > >> >>> > > >> >>> >On Mon, Aug 18, 2014 at 12:45 PM, Chris Riccomini < > >> >>> >[email protected]> wrote: > >> >>> > > >> >>> >> Hey Telles, > >> >>> >> > >> >>> >> Usually, when a job is stuck in LOCALIZING, it means that YARN is > >> >>> >> struggling to distribute your binary (the .tgz) to the > >>appropriate > >> >>> >> NodeManagers, I think. You should check your NM logs and see if > >> >>>there > >> >>> >>are > >> >>> >> any hints about what's going on there. > >> >>> >> > >> >>> >> I've seen this in the past when the NM hangs trying to download a > >> >>>.tgz > >> >>> >> from the HTTP server for some reason. > >> >>> >> > >> >>> >> Cheers, > >> >>> >> Chris > >> >>> >> > >> >>> >> On 8/16/14 10:41 PM, "Telles Nobrega" <[email protected]> > >> >>>wrote: > >> >>> >> > >> >>> >> >I was able to fix this problem, now I¹m having another one. I¹m > >> >>>using > >> >>> a > >> >>> >> >script that starts kafka, deploys samza jobs, stop them, kills > >> >>>kafka > >> >>> >>and > >> >>> >> >delete configurations in zookeeper and kafka-log files. Them > >>start > >> >>> over > >> >>> >> >again. I see that sometimes jobs don¹t start running, they stay > >>in > >> >>> >> >accepted state with info LOCALIZING, what can be the cause for > >> >>>that? > >> >>> >> > > >> >>> >> >Thanks. > >> >>> >> >On 15 Aug 2014, at 19:18, Chris Riccomini > >> >>> >> ><[email protected]> wrote: > >> >>> >> > > >> >>> >> >> Hey Telles, > >> >>> >> >> > >> >>> >> >> If you set yarn.container.count to 5, you should get 5 > >> >>>containers. > >> >>> >>The > >> >>> >> >>two > >> >>> >> >> cases where you don't are: > >> >>> >> >> > >> >>> >> >> 1. The grid is at capacity, and doesn't have the memory to > >> >>>fulfill > >> >>> >>all > >> >>> >> >> container requests. > >> >>> >> >> 2. You set yarn.container.count higher than the number of > >> >>>partitions > >> >>> >> >>that > >> >>> >> >> your input stream has. > >> >>> >> >> > >> >>> >> >> Cheers, > >> >>> >> >> Chris > >> >>> >> >> > >> >>> >> >> On 8/15/14 1:56 PM, "Telles Nobrega" <[email protected] > > > >> >>> wrote: > >> >>> >> >> > >> >>> >> >>> Hi Chris, > >> >>> >> >>> > >> >>> >> >>> I started playing with the yarn.container.count and set it > >>to 5. > >> >>> >> >>> > >> >>> >> >>> At first I thought I had to compile the package again and > >> >>>republish > >> >>> >>to > >> >>> >> >>> hdfs > >> >>> >> >>> because I couldn't run 5 containers. > >> >>> >> >>> Then I recompiled but I still only got 3 containers, is that > >> >>>normal > >> >>> >> >>> behaviour? > >> >>> >> >>> > >> >>> >> >>> Thanks. > >> >>> >> >>> > >> >>> >> >>> > >> >>> >> >>> On Wed, Aug 13, 2014 at 5:00 PM, Telles Nobrega > >> >>> >> >>><[email protected]> > >> >>> >> >>> wrote: > >> >>> >> >>> > >> >>> >> >>>> Thanks Chris, i will take a look at this links and I will > >>come > >> >>> back > >> >>> >> >>>>if I > >> >>> >> >>>> have more questions. > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> On Wed, Aug 13, 2014 at 4:33 PM, Chris Riccomini < > >> >>> >> >>>> [email protected]> wrote: > >> >>> >> >>>> > >> >>> >> >>>>> Hey Telles, > >> >>> >> >>>>> > >> >>> >> >>>>>>> Should I use many kafka brokers or one will suffice? > >> >>> >> >>>>> > >> >>> >> >>>>> The number of brokers you use is dependent on the number of > >> >>> >> >>>>> messages/sec > >> >>> >> >>>>> you're going to receive, the size of those messages, and > >>how > >> >>>long > >> >>> >> >>>>> you're > >> >>> >> >>>>> going to retain them. > >> >>> >> >>>>> > >> >>> >> >>>>> Here is a good blog post on Kafka performance that should > >>give > >> >>> you > >> >>> >> >>>>>some > >> >>> >> >>>>> idea of the numbers: > >> >>> >> >>>>> > >> >>> >> >>>>> > >> >>> >> >>>>> > >> >>> >> >>>>> > >> >>> >> >>>>> > >> >>> >> >>>>> > >> >>> >> > >> > >>>>> > https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mil > >> >>> >> >>>>>li > >> >>> >> >>>>> on- > >> >>> >> >>>>> writes-second-three-cheap-machines > >> >>> >> >>>>> > >> >>> >> >>>>> > >> >>> >> >>>>>< > >> >>> >> > >> >>> > https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mi > >> >>> >> >>>>>ll > >> >>> >> >>>>> ion-writes-second-three-cheap-machines> > >> >>> >> >>>>> > >> >>> >> >>>>>>> It could be just one job, but what is the best way to > >>deploy > >> >>> >>many > >> >>> >> >>>>>>> instances of this job so I could process a heavy load of > >> >>> >>messages? > >> >>> >> >>>>> > >> >>> >> >>>>> You should adjust the yarn.container.count to increase the > >> >>> >> >>>>>parallelism > >> >>> >> >>>>> of > >> >>> >> >>>>> your job. By default, you get one container, but you can > >> >>>adjust > >> >>> >>this > >> >>> >> >>>>> up to > >> >>> >> >>>>> the total number of input partitions that you have. Have a > >> >>>look > >> >>> >>here > >> >>> >> >>>>> for > >> >>> >> >>>>> some details about how Samza's parallelism works: > >> >>> >> >>>>> > >> >>> >> >>>>> > >> >>> >> >>>>> > >> >>> >> >>>>> > >> >>> >> >>>>> > >> >>> >> > >> > >>>>> > http://samza.incubator.apache.org/learn/documentation/0.7.0/introducti > >> >>> >> >>>>>on > >> >>> >> >>>>> /co > >> >>> >> >>>>> ncepts.html > >> >>> >> >>>>> > >> >>> >> >>>>> > >> >>> >> >>>>>< > >> >>> >> > >> >>> > http://samza.incubator.apache.org/learn/documentation/0.7.0/introduct > >> >>> >> >>>>>io > >> >>> >> >>>>> n/concepts.html> > >> >>> >> >>>>> > >> >>> >> >>>>> > >> >>> >> >>>>> > >> >>> >> >>>>> > >> >>> >> >>>>> Cheers, > >> >>> >> >>>>> Chris > >> >>> >> >>>>> > >> >>> >> >>>>> On 8/13/14 9:37 AM, "Telles Nobrega" > >><[email protected] > >> > > >> >>> >> wrote: > >> >>> >> >>>>> > >> >>> >> >>>>>> Should I use many kafka brokers or one will sufice? > >> >>> >> >>>>>> > >> >>> >> >>>>>> Thanks > >> >>> >> >>>>>> > >> >>> >> >>>>>> > >> >>> >> >>>>>> On Wed, Aug 13, 2014 at 7:24 AM, Telles Nobrega > >> >>> >> >>>>> <[email protected] > >> >>> >> >>>>>> > >> >>> >> >>>>>> wrote: > >> >>> >> >>>>>> > >> >>> >> >>>>>>> It could be just one job, but what is the best way to > >>deploy > >> >>> >>many > >> >>> >> >>>>>>> instances of this job so I could process a heavy load of > >> >>> >>messages? > >> >>> >> >>>>>>> > >> >>> >> >>>>>>> Thanks, > >> >>> >> >>>>>>> > >> >>> >> >>>>>>> On 13 Aug 2014, at 01:39, Yan Fang <[email protected] > > > >> >>> wrote: > >> >>> >> >>>>>>> > >> >>> >> >>>>>>>> *"Does one kafka-broker handle this much messages per > >> >>> second?"* > >> >>> >> >>>>>>>> > >> >>> >> >>>>>>>> I believe @Chris has better answer about this. > >> >>> >> >>>>>>>> > >> >>> >> >>>>>>>> > >> >>> >> >>>>>>>> > >> >>> >> >>>>>>>> *"I have one job that get this messages and another that > >> >>>reads > >> >>> >> >>>>> from > >> >>> >> >>>>>>> the > >> >>> >> >>>>>>>> output of the first job that does some more > >>processing."* > >> >>> >> >>>>>>>> > >> >>> >> >>>>>>>> Why not use one job get messages and process them? > >> >>> >> >>>>>>>> > >> >>> >> >>>>>>>> *" when I change a* > >> >>> >> >>>>>>>> > >> >>> >> >>>>>>>> *configuration of one my jobs do I need to recompile it > >>and > >> >>> >>send > >> >>> >> >>>>> the > >> >>> >> >>>>>>> new > >> >>> >> >>>>>>>> tar.gz to hdfs or just change the deploy/samza config > >>and > >> >>>it > >> >>> >> >>>>> should > >> >>> >> >>>>>>> work."* > >> >>> >> >>>>>>>> > >> >>> >> >>>>>>>> No, you don't need to recompile. Change the config and > >> >>> >> >>>>> run-job. It > >> >>> >> >>>>>>> will > >> >>> >> >>>>>>>> work. > >> >>> >> >>>>>>>> > >> >>> >> >>>>>>>> Thanks. > >> >>> >> >>>>>>>> > >> >>> >> >>>>>>>> Cheers, > >> >>> >> >>>>>>>> > >> >>> >> >>>>>>>> Fang, Yan > >> >>> >> >>>>>>>> [email protected] > >> >>> >> >>>>>>>> +1 (206) 849-4108 > >> >>> >> >>>>>>>> > >> >>> >> >>>>>>>> > >> >>> >> >>>>>>>> On Tue, Aug 12, 2014 at 8:47 PM, Telles Nobrega > >> >>> >> >>>>>>> <[email protected] > >> >>> >> >>>>>>>> > >> >>> >> >>>>>>>> wrote: > >> >>> >> >>>>>>>> > >> >>> >> >>>>>>>>> Not completely related to the topic of the question but > >> >>>when > >> >>> I > >> >>> >> >>>>>>> change a > >> >>> >> >>>>>>>>> configuration of one my jobs do I need to recompile it > >>and > >> >>> >>send > >> >>> >> >>>>> the > >> >>> >> >>>>>>> new > >> >>> >> >>>>>>>>> tar.gz to hdfs or just change the deploy/samza config > >>and > >> >>>it > >> >>> >> >>>>> should > >> >>> >> >>>>>>> work. > >> >>> >> >>>>>>>>> > >> >>> >> >>>>>>>>> Thanks > >> >>> >> >>>>>>>>> > >> >>> >> >>>>>>>>> > >> >>> >> >>>>>>>>> On Tue, Aug 12, 2014 at 11:23 PM, Telles Nobrega < > >> >>> >> >>>>>>> [email protected]> > >> >>> >> >>>>>>>>> wrote: > >> >>> >> >>>>>>>>> > >> >>> >> >>>>>>>>>> Hi, I'm running an experiment that I'm suppose to run > >> >>>samza > >> >>> >>with > >> >>> >> >>>>>>>>> different > >> >>> >> >>>>>>>>>> input rates. First I'm running with 420 > >>messages/second > >> >>>and > >> >>> I > >> >>> >> >>>>> scale > >> >>> >> >>>>>>> up > >> >>> >> >>>>>>> to > >> >>> >> >>>>>>>>>> 33200 messages/second. > >> >>> >> >>>>>>>>>> > >> >>> >> >>>>>>>>>> Does one kafka-broker handle this much messages per > >> >>>second? > >> >>> >> >>>>>>>>>> Second, what is the best way to read into samza this > >>much > >> >>> >> >>>>> messages? > >> >>> >> >>>>>>> I > >> >>> >> >>>>>>>>> have > >> >>> >> >>>>>>>>>> one job that get this messages and another that reads > >> >>>from > >> >>> >>the > >> >>> >> >>>>>>> output > >> >>> >> >>>>>>> of > >> >>> >> >>>>>>>>>> the first job that does some more processing. Is the > >>best > >> >>> >>way to > >> >>> >> >>>>> use > >> >>> >> >>>>>>> more > >> >>> >> >>>>>>>>>> containers and split kafka topics in partitions (the > >>same > >> >>> >> >>>>> number of > >> >>> >> >>>>>>>>>> containers) or is there a better way to do this. > >> >>> >> >>>>>>>>>> > >> >>> >> >>>>>>>>>> Thanks in advance, > >> >>> >> >>>>>>>>>> > >> >>> >> >>>>>>>>>> -- > >> >>> >> >>>>>>>>>> ------------------------------------------ > >> >>> >> >>>>>>>>>> Telles Mota Vidal Nobrega > >> >>> >> >>>>>>>>>> M.sc. Candidate at UFCG > >> >>> >> >>>>>>>>>> B.sc. in Computer Science at UFCG > >> >>> >> >>>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG > >> >>> >> >>>>>>>>>> > >> >>> >> >>>>>>>>> > >> >>> >> >>>>>>>>> > >> >>> >> >>>>>>>>> > >> >>> >> >>>>>>>>> -- > >> >>> >> >>>>>>>>> ------------------------------------------ > >> >>> >> >>>>>>>>> Telles Mota Vidal Nobrega > >> >>> >> >>>>>>>>> M.sc. Candidate at UFCG > >> >>> >> >>>>>>>>> B.sc. in Computer Science at UFCG > >> >>> >> >>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG > >> >>> >> >>>>>>>>> > >> >>> >> >>>>>>> > >> >>> >> >>>>>>> > >> >>> >> >>>>>> > >> >>> >> >>>>>> > >> >>> >> >>>>>> -- > >> >>> >> >>>>>> ------------------------------------------ > >> >>> >> >>>>>> Telles Mota Vidal Nobrega > >> >>> >> >>>>>> M.sc. Candidate at UFCG > >> >>> >> >>>>>> B.sc. in Computer Science at UFCG > >> >>> >> >>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG > >> >>> >> >>>>> > >> >>> >> >>>>> > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> -- > >> >>> >> >>>> ------------------------------------------ > >> >>> >> >>>> Telles Mota Vidal Nobrega > >> >>> >> >>>> M.sc. Candidate at UFCG > >> >>> >> >>>> B.sc. in Computer Science at UFCG > >> >>> >> >>>> Software Engineer at OpenStack Project - HP/LSD-UFCG > >> >>> >> >>>> > >> >>> >> >>> > >> >>> >> >>> > >> >>> >> >>> > >> >>> >> >>> -- > >> >>> >> >>> ------------------------------------------ > >> >>> >> >>> Telles Mota Vidal Nobrega > >> >>> >> >>> M.sc. Candidate at UFCG > >> >>> >> >>> B.sc. in Computer Science at UFCG > >> >>> >> >>> Software Engineer at OpenStack Project - HP/LSD-UFCG > >> >>> >> > > >> >>> >> > >> >>> >> > >> >>> > > >> >>> > > >> >>> >-- > >> >>> >------------------------------------------ > >> >>> >Telles Mota Vidal Nobrega > >> >>> >M.sc. Candidate at UFCG > >> >>> >B.sc. in Computer Science at UFCG > >> >>> >Software Engineer at OpenStack Project - HP/LSD-UFCG > >> >>> > >> >>> > >> >> > >> >> > >> >> -- > >> >> ------------------------------------------ > >> >> Telles Mota Vidal Nobrega > >> >> M.sc. Candidate at UFCG > >> >> B.sc. in Computer Science at UFCG > >> >> Software Engineer at OpenStack Project - HP/LSD-UFCG > >> >> > >> > > >> > > >> > > >> >-- > >> >------------------------------------------ > >> >Telles Mota Vidal Nobrega > >> >M.sc. Candidate at UFCG > >> >B.sc. in Computer Science at UFCG > >> >Software Engineer at OpenStack Project - HP/LSD-UFCG > >> > >> > > > > > >-- > >------------------------------------------ > >Telles Mota Vidal Nobrega > >M.sc. Candidate at UFCG > >B.sc. in Computer Science at UFCG > >Software Engineer at OpenStack Project - HP/LSD-UFCG > > -- ------------------------------------------ Telles Mota Vidal Nobrega M.sc. Candidate at UFCG B.sc. in Computer Science at UFCG Software Engineer at OpenStack Project - HP/LSD-UFCG
