Hey Garry,

Thanks for the good advice. I'm definitely going to read the confluent blog
post.
I actually went back and re-read the section on containers before reading
your reply and realized I was mixing between jobs and containers..


Thanks,
Michael

On Thu, May 21, 2015 at 10:50 PM, Garry Turkington <
g.turking...@improvedigital.com> wrote:

> Hi,
>
> The other variable to think about here is the task to container mapping.
> Each job will indeed have 1 task per input partition in the underlying
> topic but you can then spread those 500 instances across multiple
> containers in your Yarn grid:
>
>
> http://samza.apache.org/learn/documentation/0.9/container/samza-container.html
>
> I'd also suggest thinking about throughput requirements in terms of both
> the Kafka and Samza perspectives. Great blog post from one of the Confluent
> guys here:
>
>
> http://blog.confluent.io/2015/03/12/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
>
> For me I have Kafka brokers with JBOD disks so my starting partition count
> was number of brokers * number of disks. I went with that, discovered it
> wasn't hitting my needed throughput and had to up that several times.
> Currently I have around 160 partitions for the  high throughput topics
> (700K msgs/sec) on a 5 broker cluster.
>
> So over-partitioning is a good thing and gives additional throughput and
> more flexible growth but if you are ramping that too soon you are likely
> requiring growth of your job container count. Look at your throughput
> requirements and hardware assets, pick a starting point and test your
> assumptions. If you are  like me you'll find most of them are wrong. :)
>
> Garry
>
> -----Original Message-----
> From: Lukas Steiblys [mailto:lu...@doubledutch.me]
> Sent: 21 May 2015 20:20
> To: dev@samza.apache.org
> Subject: Re: Number of partitions
>
> Each job will get all the partitions and each task (500 of them) within
> the job will get 1 partition. So there will be 500 processes working
> through the log.
>
> I'd try to figure out what your scaling needs are for the next 2-3 years
> and then calculate your resource requirements accordingly (how many
> parallel executing tasks you would need). If you need to split, it is not
> trivial, but doable.
>
> Lukas
>
> -----Original Message-----
> From: Michael Ravits
> Sent: Thursday, May 21, 2015 11:17 AM
> To: dev@samza.apache.org
> Subject: Re: Number of partitions
>
> Well, since the number of partitions can't be changed after the system
> starts running I wanted to have the flexibility to grow a lot without
> stopping for upgrade.
> Just wonder what would be a tolerable number for Samza.
> For example if I'd start with 5 jobs, each will get 100 partitions. Is
> this reasonable? Or too much for a single job instance?
>
> On Thu, May 21, 2015 at 7:46 PM, Lukas Steiblys <lu...@doubledutch.me>
> wrote:
>
> > 500 is a bit extreme unless you're planning on running the job on some
> > 200 machines and try to exploit their full power. I personally run 4
> > in production for our system processing 100 messages/s and there's
> > plenty of room to grow.
> >
> > Lukas
> >
> > On Thursday, May 21, 2015, Michael Ravits <michaelr...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I wonder what are the considerations I need to account for in regard
> > > to
> > the
> > > number of partitions in input topics for Samza.
> > > When testing with a 500 partitions topic with one Samza job I
> > > noticed the start up time to be very long.
> > > Are there any problems that might occur when dealing with this
> > > number of partitions?
> > >
> > > Thanks,
> > > Michael
> > >
> >
>
>
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 2014.0.4800 / Virus Database: 4311/9817 - Release Date: 05/19/15
>

Reply via email to