Re: How to run multiple instances of the same job

Chris Riccomini Wed, 13 Aug 2014 12:34:14 -0700

Hey Telles,

>> Should I use many kafka brokers or one will suffice?


The number of brokers you use is dependent on the number of messages/sec
you're going to receive, the size of those messages, and how long you're
going to retain them.

Here is a good blog post on Kafka performance that should give you some
idea of the numbers:

  
https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-
writes-second-three-cheap-machines

>> It could be just one job, but what is the best way to deploy many
>>instances of this job so I could process a heavy load of messages?

You should adjust the yarn.container.count to increase the parallelism of
your job. By default, you get one container, but you can adjust this up to
the total number of input partitions that you have. Have a look here for
some details about how Samza's parallelism works:

http://samza.incubator.apache.org/learn/documentation/0.7.0/introduction/co
ncepts.html




Cheers,
Chris

On 8/13/14 9:37 AM, "Telles Nobrega" <[email protected]> wrote:

>Should I use many kafka brokers or one will sufice?
>
>Thanks
>
>
>On Wed, Aug 13, 2014 at 7:24 AM, Telles Nobrega <[email protected]>
>wrote:
>
>> It could be just one job, but what is the best way to deploy many
>> instances of this job so I could process a heavy load of messages?
>>
>> Thanks,
>>
>> On 13 Aug 2014, at 01:39, Yan Fang <[email protected]> wrote:
>>
>> > *"Does one kafka-broker handle this much messages per second?"*
>> >
>> >  I believe @Chris has better answer about this.
>> >
>> >
>> >
>> > *"I have one job that get this messages and another that reads from
>>the
>> > output of the first job that does some more processing."*
>> >
>> >    Why not use one job get messages and process them?
>> >
>> > *" when I change a*
>> >
>> > *configuration of one my jobs do I need to recompile it and send the
>>new
>> > tar.gz to hdfs or just change the deploy/samza config and it should
>> work."*
>> >
>> >    No, you don't need to recompile. Change the config and run-job. It
>> will
>> > work.
>> >
>> > Thanks.
>> >
>> > Cheers,
>> >
>> > Fang, Yan
>> > [email protected]
>> > +1 (206) 849-4108
>> >
>> >
>> > On Tue, Aug 12, 2014 at 8:47 PM, Telles Nobrega
>><[email protected]
>> >
>> > wrote:
>> >
>> >> Not completely related to the topic of the question but when I
>>change a
>> >> configuration of one my jobs do I need to recompile it and send the
>>new
>> >> tar.gz to hdfs or just change the deploy/samza config and it should
>> work.
>> >>
>> >> Thanks
>> >>
>> >>
>> >> On Tue, Aug 12, 2014 at 11:23 PM, Telles Nobrega <
>> [email protected]>
>> >> wrote:
>> >>
>> >>> Hi, I'm running an experiment that I'm suppose to run samza with
>> >> different
>> >>> input rates. First I'm running with 420 messages/second and I scale
>>up
>> to
>> >>> 33200 messages/second.
>> >>>
>> >>> Does one kafka-broker handle this much messages per second?
>> >>> Second, what is the best way to read into samza this much messages?
>>I
>> >> have
>> >>> one job that get this messages and another that reads from the
>>output
>> of
>> >>> the first job that does some more processing. Is the best way to use
>> more
>> >>> containers and split kafka topics in partitions (the same number of
>> >>> containers) or is there a better way to do this.
>> >>>
>> >>> Thanks in advance,
>> >>>
>> >>> --
>> >>> ------------------------------------------
>> >>> Telles Mota Vidal Nobrega
>> >>> M.sc. Candidate at UFCG
>> >>> B.sc. in Computer Science at UFCG
>> >>> Software Engineer at OpenStack Project - HP/LSD-UFCG
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> ------------------------------------------
>> >> Telles Mota Vidal Nobrega
>> >> M.sc. Candidate at UFCG
>> >> B.sc. in Computer Science at UFCG
>> >> Software Engineer at OpenStack Project - HP/LSD-UFCG
>> >>
>>
>>
>
>
>-- 
>------------------------------------------
>Telles Mota Vidal Nobrega
>M.sc. Candidate at UFCG
>B.sc. in Computer Science at UFCG
>Software Engineer at OpenStack Project - HP/LSD-UFCG

Re: How to run multiple instances of the same job

Reply via email to