@Joe, Achanta is using Indian English numerals which is why it's a little confusing. http://en.wikipedia.org/wiki/Indian_English#Numbering_system 1,00,000 [1 lakh] (Indian English) == 100,000 [1 hundred thousand] (The rest of the world :P)
On Fri Dec 19 2014 at 9:40:29 AM Achanta Vamsi Subhash < achanta.va...@flipkart.com> wrote: > Joe, > > - Correction, it's 1,00,000 partitions > - We can have at max only 1 consumer/partition. Not 50 per 1 partition. > Yes, we have a hashing mechanism to support future partition increase as > well. We override the Default Partitioner. > - We use both Simple and HighLevel consumers depending on the consumption > use-case. > - I clearly mentioned that 200 TB/week and not a day. > - We have separate producers and consumers, each operating as different > processes in different machines. > > I was explaining why we may end up with so many partitions. I think the > question about 200 TB/day got deviated. > > Any suggestions reg. the performance impact of the 200TB/week? > > On Fri, Dec 19, 2014 at 10:53 PM, Joe Stein <joe.st...@stealth.ly> wrote: > > > > Wait, how do you get 2,000 topics each with 50 partitions == 1,000,000 > > partitions? I think you can take what I said below and change my 250 to > 25 > > as I went with your result (1,000,000) and not your arguments (2,000 x > 50). > > > > And you should think on the processing as a separate step from fetch and > > commit your offset in batch post processing. Then you only need more > > partitions to fetch batches to process in parallel. > > > > Regards, Joestein > > > > On Fri, Dec 19, 2014 at 12:01 PM, Joe Stein <joe.st...@stealth.ly> > wrote: > > > > > > see some comments inline > > > > > > On Fri, Dec 19, 2014 at 11:30 AM, Achanta Vamsi Subhash < > > > achanta.va...@flipkart.com> wrote: > > >> > > >> We require: > > >> - many topics > > >> - ordering of messages for every topic > > >> > > > > > > Ordering is only on a per partition basis so you might have to pick a > > > partition key that makes sense for what you are doing. > > > > > > > > >> - Consumers hit different Http EndPoints which may be slow (in a push > > >> model). In case of a Pull model, consumers may pull at the rate at > which > > >> they can process. > > >> - We need parallelism to hit with as many consumers. Hence, we > currently > > >> have around 50 consumers/topic => 50 partitions. > > >> > > > > > > I think you might be mixing up the fetch with the processing. You can > > have > > > 1 partition and still have 50 message being processed in parallel (so a > > > batch of messages). > > > > > > What language are you working in? How are you doing this processing > > > exactly? > > > > > > > > >> > > >> Currently we have: > > >> 2000 topics x 50 => 1,00,000 partitions. > > >> > > > > > > If this is really the case then you are going to need at least 250 > > brokers > > > (~ 4,000 partitions per broker). > > > > > > If you do that then you are in the 200TB per day world which doesn't > > sound > > > to be the case. > > > > > > I really think you need to strategize more on your processing model > some > > > more. > > > > > > > > >> > > >> The incoming rate of ingestion at max is 100 MB/sec. We are planning > > for a > > >> big cluster with many brokers. > > > > > > > > > It is possible to handle this on just 3 brokers depending on message > > size, > > > ability to batch, durability are also factors you really need to be > > > thinking about. > > > > > > > > >> > > >> We have exactly the same use cases as mentioned in this video (usage > at > > >> LinkedIn): > > >> https://www.youtube.com/watch?v=19DvtEC0EbQ > > >> > > >> To handle the zookeeper scenario, as mentioned in the above video, we > > are > > >> planning to use SSDs and would upgrade to the new consumer (0.9+) > once > > >> its > > >> available as per the below video. > > >> https://www.youtube.com/watch?v=7TZiN521FQA > > >> > > >> On Fri, Dec 19, 2014 at 9:06 PM, Jayesh Thakrar > > >> <j_thak...@yahoo.com.invalid > > >> > wrote: > > >> > > >> > Technically/conceptually it is possible to have 200,000 topics, but > do > > >> you > > >> > really need it like that?What do you intend to do with those > messages > > - > > >> > i.e. how do you forsee them being processed downstream? And are > those > > >> > topics really there to segregate different kinds of processing or > > >> different > > >> > ids?E.g. if you were LinkedIn, Facebook or Google, would you have > have > > >> one > > >> > topic per user or one topic per kind of event (e.g. login, pageview, > > >> > adview, etc.)Remember there is significant book-keeping done within > > >> > Zookeeper - and these many topics will make that book-keeping > > >> significant. > > >> > As for storage, I don't think it should be an issue with sufficient > > >> > spindles, servers and higher than default memory configuration. > > >> > Jayesh > > >> > From: Achanta Vamsi Subhash <achanta.va...@flipkart.com> > > >> > To: "users@kafka.apache.org" <users@kafka.apache.org> > > >> > Sent: Friday, December 19, 2014 9:00 AM > > >> > Subject: Re: Max. storage for Kafka and impact > > >> > > > >> > Yes. We need those many max partitions as we have a central > messaging > > >> > service and thousands of topics. > > >> > > > >> > On Friday, December 19, 2014, nitin sharma < > > kumarsharma.ni...@gmail.com > > >> > > > >> > wrote: > > >> > > > >> > > hi, > > >> > > > > >> > > Few things you have to plan for: > > >> > > a. Ensure that from resilience point of view, you are having > > >> sufficient > > >> > > follower brokers for your partitions. > > >> > > b. In my testing of kafka (50TB/week) so far, haven't seen much > > issue > > >> > with > > >> > > CPU utilization or memory. I had 24 CPU and 32GB RAM. > > >> > > c. 200,000 partitions means around 1MB/week/partition. are you > sure > > >> you > > >> > > need so many partitions? > > >> > > > > >> > > Regards, > > >> > > Nitin Kumar Sharma. > > >> > > > > >> > > > > >> > > On Fri, Dec 19, 2014 at 9:12 AM, Achanta Vamsi Subhash < > > >> > > achanta.va...@flipkart.com <javascript:;>> wrote: > > >> > > > > > >> > > > We definitely need a retention policy of a week. Hence. > > >> > > > > > >> > > > On Fri, Dec 19, 2014 at 7:40 PM, Achanta Vamsi Subhash < > > >> > > > achanta.va...@flipkart.com <javascript:;>> wrote: > > >> > > > > > > >> > > > > Hi, > > >> > > > > > > >> > > > > We are using Kafka for our messaging system and we have an > > >> estimate > > >> > for > > >> > > > > 200 TB/week in the coming months. Will it impact any > performance > > >> for > > >> > > > Kafka? > > >> > > > > > > >> > > > > PS: We will be having greater than 2 lakh partitions. > > >> > > > >> > > > >> > > > > > > >> > > > > -- > > >> > > > > Regards > > >> > > > > Vamsi Subhash > > >> > > > > > > >> > > > > > >> > > > > > >> > > > -- > > >> > > > Regards > > >> > > > Vamsi Subhash > > >> > > > > > >> > > > > >> > > > >> > > > >> > -- > > >> > Regards > > >> > Vamsi Subhash > > >> > > > >> > > > >> > > > >> > > > >> > > >> > > >> > > >> -- > > >> Regards > > >> Vamsi Subhash > > >> > > > > > > > > -- > Regards > Vamsi Subhash >