Re: Max. storage for Kafka and impact

Pradeep Gollakota Fri, 19 Dec 2014 12:44:49 -0800

@Joe, Achanta is using Indian English numerals which is why it's a little
confusing. http://en.wikipedia.org/wiki/Indian_English#Numbering_system
1,00,000 [1 lakh] (Indian English) == 100,000 [1 hundred thousand] (The
rest of the world :P)


On Fri Dec 19 2014 at 9:40:29 AM Achanta Vamsi Subhash <
achanta.va...@flipkart.com> wrote:

> Joe,
>
> - Correction, it's 1,00,000 partitions
> - We can have at max only 1 consumer/partition. Not 50 per 1 partition.
> Yes, we have a hashing mechanism to support future partition increase as
> well. We override the Default Partitioner.
> - We use both Simple and HighLevel consumers depending on the consumption
> use-case.
> - I clearly mentioned that 200 TB/week and not a day.
> - We have separate producers and consumers, each operating as different
> processes in different machines.
>
> I was explaining why we may end up with so many partitions. I think the
> question about 200 TB/day got deviated.
>
> Any suggestions reg. the performance impact of the 200TB/week?
>
> On Fri, Dec 19, 2014 at 10:53 PM, Joe Stein <joe.st...@stealth.ly> wrote:
> >
> > Wait, how do you get 2,000 topics each with 50 partitions == 1,000,000
> > partitions? I think you can take what I said below and change my 250 to
> 25
> > as I went with your result (1,000,000) and not your arguments (2,000 x
> 50).
> >
> > And you should think on the processing as a separate step from fetch and
> > commit your offset in batch post processing. Then you only need more
> > partitions to fetch batches to process in parallel.
> >
> > Regards, Joestein
> >
> > On Fri, Dec 19, 2014 at 12:01 PM, Joe Stein <joe.st...@stealth.ly>
> wrote:
> > >
> > > see some comments inline
> > >
> > > On Fri, Dec 19, 2014 at 11:30 AM, Achanta Vamsi Subhash <
> > > achanta.va...@flipkart.com> wrote:
> > >>
> > >> We require:
> > >> - many topics
> > >> - ordering of messages for every topic
> > >>
> > >
> > > Ordering is only on a per partition basis so you might have to pick a
> > > partition key that makes sense for what you are doing.
> > >
> > >
> > >> - Consumers hit different Http EndPoints which may be slow (in a push
> > >> model). In case of a Pull model, consumers may pull at the rate at
> which
> > >> they can process.
> > >> - We need parallelism to hit with as many consumers. Hence, we
> currently
> > >> have around 50 consumers/topic => 50 partitions.
> > >>
> > >
> > > I think you might be mixing up the fetch with the processing. You can
> > have
> > > 1 partition and still have 50 message being processed in parallel (so a
> > > batch of messages).
> > >
> > > What language are you working in? How are you doing this processing
> > > exactly?
> > >
> > >
> > >>
> > >> Currently we have:
> > >> 2000 topics x 50 => 1,00,000 partitions.
> > >>
> > >
> > > If this is really the case then you are going to need at least 250
> > brokers
> > > (~ 4,000 partitions per broker).
> > >
> > > If you do that then you are in the 200TB per day world which doesn't
> > sound
> > > to be the case.
> > >
> > > I really think you need to strategize more on your processing model
> some
> > > more.
> > >
> > >
> > >>
> > >> The incoming rate of ingestion at max is 100 MB/sec. We are planning
> > for a
> > >> big cluster with many brokers.
> > >
> > >
> > > It is possible to handle this on just 3 brokers depending on message
> > size,
> > > ability to batch, durability are also factors you really need to be
> > > thinking about.
> > >
> > >
> > >>
> > >> We have exactly the same use cases as mentioned in this video (usage
> at
> > >> LinkedIn):
> > >> https://www.youtube.com/watch?v=19DvtEC0EbQ
> > >>
> > >> To handle the zookeeper scenario, as mentioned in the above video, we
> > are
> > >> planning to use SSDs and would upgrade to the new consumer (0.9+)
> once
> > >> its
> > >> available as per the below video.
> > >> https://www.youtube.com/watch?v=7TZiN521FQA
> > >>
> > >> On Fri, Dec 19, 2014 at 9:06 PM, Jayesh Thakrar
> > >> <j_thak...@yahoo.com.invalid
> > >> > wrote:
> > >>
> > >> > Technically/conceptually it is possible to have 200,000 topics, but
> do
> > >> you
> > >> > really need it like that?What do you intend to do with those
> messages
> > -
> > >> > i.e. how do you forsee them being processed downstream? And are
> those
> > >> > topics really there to segregate different kinds of processing or
> > >> different
> > >> > ids?E.g. if you were LinkedIn, Facebook or Google, would you have
> have
> > >> one
> > >> > topic per user or one topic per kind of event (e.g. login, pageview,
> > >> > adview, etc.)Remember there is significant book-keeping done within
> > >> > Zookeeper - and these many topics will make that book-keeping
> > >> significant.
> > >> > As for storage, I don't think it should be an issue with sufficient
> > >> > spindles, servers and higher than default memory configuration.
> > >> > Jayesh
> > >> >       From: Achanta Vamsi Subhash <achanta.va...@flipkart.com>
> > >> >  To: "users@kafka.apache.org" <users@kafka.apache.org>
> > >> >  Sent: Friday, December 19, 2014 9:00 AM
> > >> >  Subject: Re: Max. storage for Kafka and impact
> > >> >
> > >> > Yes. We need those many max partitions as we have a central
> messaging
> > >> > service and thousands of topics.
> > >> >
> > >> > On Friday, December 19, 2014, nitin sharma <
> > kumarsharma.ni...@gmail.com
> > >> >
> > >> > wrote:
> > >> >
> > >> > > hi,
> > >> > >
> > >> > > Few things you have to plan for:
> > >> > > a. Ensure that from resilience point of view, you are having
> > >> sufficient
> > >> > > follower brokers for your partitions.
> > >> > > b. In my testing of kafka (50TB/week) so far, haven't seen much
> > issue
> > >> > with
> > >> > > CPU utilization or memory. I had 24 CPU and 32GB RAM.
> > >> > > c. 200,000 partitions means around 1MB/week/partition. are you
> sure
> > >> you
> > >> > > need so many partitions?
> > >> > >
> > >> > > Regards,
> > >> > > Nitin Kumar Sharma.
> > >> > >
> > >> > >
> > >> > > On Fri, Dec 19, 2014 at 9:12 AM, Achanta Vamsi Subhash <
> > >> > > achanta.va...@flipkart.com <javascript:;>> wrote:
> > >> > > >
> > >> > > > We definitely need a retention policy of a week. Hence.
> > >> > > >
> > >> > > > On Fri, Dec 19, 2014 at 7:40 PM, Achanta Vamsi Subhash <
> > >> > > > achanta.va...@flipkart.com <javascript:;>> wrote:
> > >> > > > >
> > >> > > > > Hi,
> > >> > > > >
> > >> > > > > We are using Kafka for our messaging system and we have an
> > >> estimate
> > >> > for
> > >> > > > > 200 TB/week in the coming months. Will it impact any
> performance
> > >> for
> > >> > > > Kafka?
> > >> > > > >
> > >> > > > > PS: We will be having greater than 2 lakh partitions.
> > >> >
> > >> >
> > >> > > > >
> > >> > > > > --
> > >> > > > > Regards
> > >> > > > > Vamsi Subhash
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > Regards
> > >> > > > Vamsi Subhash
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> > Regards
> > >> > Vamsi Subhash
> > >> >
> > >> >
> > >> >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Regards
> > >> Vamsi Subhash
> > >>
> > >
> >
>
>
> --
> Regards
> Vamsi Subhash
>

Re: Max. storage for Kafka and impact

Reply via email to