Re: Regarding Kafka connect task to partition relationship for both source and sink connectors
For sink connectors, I believe you can scale up the tasks to match the partitions on the topic. But I don't believe this is the case for source connectors; the number of partitions on the topic you're producing to has nothing to do with the number of connector tasks. It really depends on the individual source connector and if the source data-type could benefit from multiple tasks. For example, the JDBC source connector (a very popular connector) only supports 1 task - even if you're querying multiple tables. Bottom line: you'll need to check the documentation for the connector in question to see if it supports multiple tasks. Alex On Thu, May 30, 2024 at 7:51 AM Sébastien Rebecchi wrote: > Hello > > Confirmed. Partition is the minimal granularity level, so having more > consumers than the number of partitions of a topic for a same consumer > group is useless, having P partitions means maximum parallelism is reached > using P consumers. > > Regards, > > Sébastien. > > Le jeu. 30 mai 2024 à 14:43, Yeikel Santana a écrit : > > > Hi everyone, > > > > > > From my understanding, if a topic has n partitions, we can create up to > n > > tasks for both the source and sink connectors to achieve the maximum > > parallelism. Adding more tasks would not be beneficial, as they would > > remain idle and be limited to the number of partitions of the topic > > > > > > Could you please confirm if this understanding is correct? > > > > > > If this understanding is incorrect could you please explain the > > relationship if any? > > > > > > Thank you! > > > > > > >
Re: Regarding Kafka connect task to partition relationship for both source and sink connectors
Hello Confirmed. Partition is the minimal granularity level, so having more consumers than the number of partitions of a topic for a same consumer group is useless, having P partitions means maximum parallelism is reached using P consumers. Regards, Sébastien. Le jeu. 30 mai 2024 à 14:43, Yeikel Santana a écrit : > Hi everyone, > > > From my understanding, if a topic has n partitions, we can create up to n > tasks for both the source and sink connectors to achieve the maximum > parallelism. Adding more tasks would not be beneficial, as they would > remain idle and be limited to the number of partitions of the topic > > > Could you please confirm if this understanding is correct? > > > If this understanding is incorrect could you please explain the > relationship if any? > > > Thank you! > > >
Re: Regarding kafka 2.3.0
1. Is kafka 2.3.0 going end of life ,If yes then what is the expected date? -> Kafka supports last 3 releases. REF: https://cwiki.apache.org/confluence/display/KAFKA/Time+Based+Release+Plan#TimeBasedReleasePlan-WhatIsOurEOLPolicy ? 2. Is kafka 3.1.0 backward compatible to 2.3.0? -> Since 2.3 to 3.1 has one major release (through 3.0), some deprecated features are removed. You can refer to this doc for upgrade guide: https://kafka.apache.org/documentation/#upgrade_3_1_0 , and check for release note for each release. Thanks Luke On Thu, Aug 25, 2022 at 3:42 PM Fred Bai wrote: > +1 > Me too, We consider upgrading Kafka to 3.X from Kafka 2.X, but don't know > the compatibility. > > thx > > Ankit Saran 于2022年8月23日周二 22:21写道: > > > Hi Team, > > We are planning to upgrade kafka version from 2.3.0 to 3.1.0 , We have > > below queries regarding the same > > > > 1. Is kafka 2.3.0 going end of life ,If yes then what is the expected > date? > > 2. Is kafka 3.1.0 backward compatible to 2.3.0? > > > > Please help us with the above queries, Thanks in advance. > > > > Regards, > > Ankit Saran > > >
Re: Regarding kafka 2.3.0
+1 Me too, We consider upgrading Kafka to 3.X from Kafka 2.X, but don't know the compatibility. thx Ankit Saran 于2022年8月23日周二 22:21写道: > Hi Team, > We are planning to upgrade kafka version from 2.3.0 to 3.1.0 , We have > below queries regarding the same > > 1. Is kafka 2.3.0 going end of life ,If yes then what is the expected date? > 2. Is kafka 3.1.0 backward compatible to 2.3.0? > > Please help us with the above queries, Thanks in advance. > > Regards, > Ankit Saran >
Re: Regarding Kafka Consumer
Thanks, Faraz. I am using its Java API. It does not seem to provide such method to the consumer. On Wed, Nov 22, 2017 at 2:45 PM, Faraz Mateenwrote: > Not sure which client you are using. > In kafka-python, consumer.config returns a dictionary with all consumer > properties. > > Thanks, > Faraz > > On Mon, Nov 20, 2017 at 5:34 PM, simarpreet kaur > wrote: > >> Hello team, >> >> I wanted to know if there is some way I can retrieve consumer properties >> from the Kafka Consumer. for example, if at runtime, I want to know the >> group id of a particular consumer, in case multiple consumers are running >> in my application. >> >> Thanks & Regards, >> Simarpreet >> > >
Re: Regarding kafka-manager topics parameters
Images didn't come thru. Consider using third party website. On Tue, Oct 17, 2017 at 9:36 PM, Pavan Pataniwrote: > Hello, > > Previously I was using old version of Kafka-manager and it was showing > "Producer Message/Sec and Summed Recent Offsets" parameters in topics as > below. > > [image: Inline image 1] > Currently I have installed kafka-manager-1.3.3.14 and now I can not see > these two "Producer Message/Sec and Summed Recent Offsets" parameters in > topics as below. > > [image: Inline image 2] > > Could you please guide me to add these two columns. > > Regards, > Pavan Patani > > >
Re: Regarding Kafka
Yeah that I realized and later read that in Java Kafka consumer there is one thread, that's why such behavior does not arise there. May be if I need to restrict my application to a single threaded :( in order to achieve that.. I need to ask Magnus Edenhill who is librdkafka expert.. Thanks for your time Hans Abhi On Sun, Oct 9, 2016 at 10:08 PM, Hans Jespersen <h...@confluent.io> wrote: > I'm pretty sure Jun was talking about the Java API in the quoted blog > text, not librdkafka. There is only one thread in the new Java consumer so > you wouldn't see this behavior. I do not think that librdkafka makes any > such guarantee to dispatch unique keys to each thread but I'm not an expert > in librdkafka so others may be about to help you better on that. > //h...@confluent.io > Original message From: Abhit Kalsotra <abhit...@gmail.com> > Date: 10/9/16 3:58 AM (GMT-08:00) To: users@kafka.apache.org Subject: > Re: Regarding Kafka > I did that but i am getting confusing results > > e.g > > I have created 4 Kafka Consumer threads for doing data analytic, these > threads just wait for Kafka messages to get consumed and > I have provided the key provided when I produce, it means that all the > messages will go to one single partition ref " > http://www.confluent.io/blog/how-to-choose-the-number-of- > topicspartitions-in-a-kafka-cluster/ > " > "* On the consumer side, Kafka always gives a single partition’s data to > one consumer thread.*" > > If you see my application logs, my 4 Kafka Consumer Application threads > which are calling consume() , Arn't all message of a particular ID should > be consumed by one Kafka Application thread ? > > [2016-10-08 23:37:07.498]AxThreadId 23516 ->ID:4495 offset: 74 ][ID ID > date:2016-09-28 20:07:32.000 ] > [2016-10-08 23:37:07.498]AxThreadId 2208 ->ID:4496 offset: 80 ][ID ID > date: 2016-09-28 20:07:39.000 ] > [2016-10-08 23:37:07.498]AxThreadId 2208 ->ID:4495 offset: 77 ][ID > date: 2016-09-28 20:07:35.000 ] > [2016-10-08 23:37:07.498]AxThreadId 23516 ->ID:4495 offset: 76][ID > date: 2016-09-28 20:07:34.000 ] > [2016-10-08 23:37:07.498]AxThreadId 9540 ->ID:4495 offset: 75 ][ID > date: 2016-09-28 20:07:33.000 ] > [2016-10-08 23:37:07.499]AxThreadId 23516 ->ID:4495 offset: 78 ][ID > date: 2016-09-28 20:07:36.000 ] > [2016-10-08 23:37:07.499]AxThreadId 2208 ->ID:4495 offset: 79 ][ID > date: 2016-09-28 20:07:37.000 ] > [2016-10-08 23:37:07.499]AxThreadId 9540 ->ID:4495 offset: 80 ][ID > date: 2016-09-28 20:07:38.000 ] > [2016-10-08 23:37:07.500]AxThreadId 23516 ->ID:4495 offset: 81][ID > date: 2016-09-28 20:07:39.000 ] > > > > > On Sun, Oct 9, 2016 at 1:31 PM, Hans Jespersen <h...@confluent.io> wrote: > > > Then publish with the user ID as the key and all messages for the same > key > > will be guaranteed to go to the same partition and therefore be in order > > for whichever consumer gets that partition. > > > > > > //h...@confluent.io > > Original message From: Abhit Kalsotra < > abhit...@gmail.com> > > Date: 10/9/16 12:39 AM (GMT-08:00) To: users@kafka.apache.org Subject: > > Re: Regarding Kafka > > What about the order of message getting received ? If i don't mention the > > partition. > > > > Lets say if i have user ID :4456 and I have to do some analytics at the > > Kafka Consumer end and at my consumer end if its not getting consumed the > > way I sent, then my analytics will go haywire. > > > > Abhi > > > > On Sun, Oct 9, 2016 at 12:50 PM, Hans Jespersen <h...@confluent.io> > wrote: > > > > > You don't even have to do that because the default partitioner will > > spread > > > the data you publish to the topic over the available partitions for > you. > > > Just try it out to see. Publish multiple messages to the topic without > > > using keys, and without specifying a partition, and observe that they > are > > > automatically distributed out over the available partitions. > > > > > > > > > //h...@confluent.io > > > Original message From: Abhit Kalsotra < > > abhit...@gmail.com> > > > Date: 10/8/16 11:19 PM (GMT-08:00) To: users@kafka.apache.org > Subject: > > > Re: Regarding Kafka > > > Hans > > > > > > Thanks for the response, yeah you can say yeah I am treating topics > like > > > partitions, because my > > > > > > current logic of producing to a respective topic goes something like > this > > > > > > RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_ &
Re: Regarding Kafka
I'm pretty sure Jun was talking about the Java API in the quoted blog text, not librdkafka. There is only one thread in the new Java consumer so you wouldn't see this behavior. I do not think that librdkafka makes any such guarantee to dispatch unique keys to each thread but I'm not an expert in librdkafka so others may be about to help you better on that. //h...@confluent.io Original message From: Abhit Kalsotra <abhit...@gmail.com> Date: 10/9/16 3:58 AM (GMT-08:00) To: users@kafka.apache.org Subject: Re: Regarding Kafka I did that but i am getting confusing results e.g I have created 4 Kafka Consumer threads for doing data analytic, these threads just wait for Kafka messages to get consumed and I have provided the key provided when I produce, it means that all the messages will go to one single partition ref " http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/ " "* On the consumer side, Kafka always gives a single partition’s data to one consumer thread.*" If you see my application logs, my 4 Kafka Consumer Application threads which are calling consume() , Arn't all message of a particular ID should be consumed by one Kafka Application thread ? [2016-10-08 23:37:07.498]AxThreadId 23516 ->ID:4495 offset: 74 ][ID ID date:2016-09-28 20:07:32.000 ] [2016-10-08 23:37:07.498]AxThreadId 2208 ->ID:4496 offset: 80 ][ID ID date: 2016-09-28 20:07:39.000 ] [2016-10-08 23:37:07.498]AxThreadId 2208 ->ID:4495 offset: 77 ][ID date: 2016-09-28 20:07:35.000 ] [2016-10-08 23:37:07.498]AxThreadId 23516 ->ID:4495 offset: 76][ID date: 2016-09-28 20:07:34.000 ] [2016-10-08 23:37:07.498]AxThreadId 9540 ->ID:4495 offset: 75 ][ID date: 2016-09-28 20:07:33.000 ] [2016-10-08 23:37:07.499]AxThreadId 23516 ->ID:4495 offset: 78 ][ID date: 2016-09-28 20:07:36.000 ] [2016-10-08 23:37:07.499]AxThreadId 2208 ->ID:4495 offset: 79 ][ID date: 2016-09-28 20:07:37.000 ] [2016-10-08 23:37:07.499]AxThreadId 9540 ->ID:4495 offset: 80 ][ID date: 2016-09-28 20:07:38.000 ] [2016-10-08 23:37:07.500]AxThreadId 23516 ->ID:4495 offset: 81][ID date: 2016-09-28 20:07:39.000 ] On Sun, Oct 9, 2016 at 1:31 PM, Hans Jespersen <h...@confluent.io> wrote: > Then publish with the user ID as the key and all messages for the same key > will be guaranteed to go to the same partition and therefore be in order > for whichever consumer gets that partition. > > > //h...@confluent.io > Original message From: Abhit Kalsotra <abhit...@gmail.com> > Date: 10/9/16 12:39 AM (GMT-08:00) To: users@kafka.apache.org Subject: > Re: Regarding Kafka > What about the order of message getting received ? If i don't mention the > partition. > > Lets say if i have user ID :4456 and I have to do some analytics at the > Kafka Consumer end and at my consumer end if its not getting consumed the > way I sent, then my analytics will go haywire. > > Abhi > > On Sun, Oct 9, 2016 at 12:50 PM, Hans Jespersen <h...@confluent.io> wrote: > > > You don't even have to do that because the default partitioner will > spread > > the data you publish to the topic over the available partitions for you. > > Just try it out to see. Publish multiple messages to the topic without > > using keys, and without specifying a partition, and observe that they are > > automatically distributed out over the available partitions. > > > > > > //h...@confluent.io > > Original message From: Abhit Kalsotra < > abhit...@gmail.com> > > Date: 10/8/16 11:19 PM (GMT-08:00) To: users@kafka.apache.org Subject: > > Re: Regarding Kafka > > Hans > > > > Thanks for the response, yeah you can say yeah I am treating topics like > > partitions, because my > > > > current logic of producing to a respective topic goes something like this > > > > RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_ > > kafkaTopic[whichTopic], > > > partition, > > > > RdKafka::Producer::RK_MSG_COPY, > > ptr, > > size, > > > > , > > NULL); > > where partitionKey is unique number or userID, so what I am doing > currently > > each partitionKey%10 > > so whats so ever is the remainder, I am dumping that to the respective > > topic. > > > > But as per your suggestion, Let me create close to 40-50 partitions for a > > single topic and when i am producing I do something like this > > > > RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_kafkaTopic, > > > > partition%(50), > &
Re: Regarding Kafka
I did that but i am getting confusing results e.g I have created 4 Kafka Consumer threads for doing data analytic, these threads just wait for Kafka messages to get consumed and I have provided the key provided when I produce, it means that all the messages will go to one single partition ref " http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/ " "* On the consumer side, Kafka always gives a single partition’s data to one consumer thread.*" If you see my application logs, my 4 Kafka Consumer Application threads which are calling consume() , Arn't all message of a particular ID should be consumed by one Kafka Application thread ? [2016-10-08 23:37:07.498]AxThreadId 23516 ->ID:4495 offset: 74 ][ID ID date:2016-09-28 20:07:32.000 ] [2016-10-08 23:37:07.498]AxThreadId 2208 ->ID:4496 offset: 80 ][ID ID date: 2016-09-28 20:07:39.000 ] [2016-10-08 23:37:07.498]AxThreadId 2208 ->ID:4495 offset: 77 ][ID date: 2016-09-28 20:07:35.000 ] [2016-10-08 23:37:07.498]AxThreadId 23516 ->ID:4495 offset: 76][ID date: 2016-09-28 20:07:34.000 ] [2016-10-08 23:37:07.498]AxThreadId 9540 ->ID:4495 offset: 75 ][ID date: 2016-09-28 20:07:33.000 ] [2016-10-08 23:37:07.499]AxThreadId 23516 ->ID:4495 offset: 78 ][ID date: 2016-09-28 20:07:36.000 ] [2016-10-08 23:37:07.499]AxThreadId 2208 ->ID:4495 offset: 79 ][ID date: 2016-09-28 20:07:37.000 ] [2016-10-08 23:37:07.499]AxThreadId 9540 ->ID:4495 offset: 80 ][ID date: 2016-09-28 20:07:38.000 ] [2016-10-08 23:37:07.500]AxThreadId 23516 ->ID:4495 offset: 81][ID date: 2016-09-28 20:07:39.000 ] On Sun, Oct 9, 2016 at 1:31 PM, Hans Jespersen <h...@confluent.io> wrote: > Then publish with the user ID as the key and all messages for the same key > will be guaranteed to go to the same partition and therefore be in order > for whichever consumer gets that partition. > > > //h...@confluent.io > Original message From: Abhit Kalsotra <abhit...@gmail.com> > Date: 10/9/16 12:39 AM (GMT-08:00) To: users@kafka.apache.org Subject: > Re: Regarding Kafka > What about the order of message getting received ? If i don't mention the > partition. > > Lets say if i have user ID :4456 and I have to do some analytics at the > Kafka Consumer end and at my consumer end if its not getting consumed the > way I sent, then my analytics will go haywire. > > Abhi > > On Sun, Oct 9, 2016 at 12:50 PM, Hans Jespersen <h...@confluent.io> wrote: > > > You don't even have to do that because the default partitioner will > spread > > the data you publish to the topic over the available partitions for you. > > Just try it out to see. Publish multiple messages to the topic without > > using keys, and without specifying a partition, and observe that they are > > automatically distributed out over the available partitions. > > > > > > //h...@confluent.io > > Original message From: Abhit Kalsotra < > abhit...@gmail.com> > > Date: 10/8/16 11:19 PM (GMT-08:00) To: users@kafka.apache.org Subject: > > Re: Regarding Kafka > > Hans > > > > Thanks for the response, yeah you can say yeah I am treating topics like > > partitions, because my > > > > current logic of producing to a respective topic goes something like this > > > > RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_ > > kafkaTopic[whichTopic], > > > partition, > > > > RdKafka::Producer::RK_MSG_COPY, > > ptr, > > size, > > > > , > > NULL); > > where partitionKey is unique number or userID, so what I am doing > currently > > each partitionKey%10 > > so whats so ever is the remainder, I am dumping that to the respective > > topic. > > > > But as per your suggestion, Let me create close to 40-50 partitions for a > > single topic and when i am producing I do something like this > > > > RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_kafkaTopic, > > > > partition%(50), > > > > RdKafka::Producer::RK_MSG_COPY, > > ptr, > > size, > > > > , > > NULL); > > > > Abhi > > > > On Sun, Oct 9, 2016 at 10:13 AM, Hans Jespersen <h...@confluent.io> > wrote: > > > > > Why do you have 10 topics? It seems like you are treating topics like > > > parti
Re: Regarding Kafka
Then publish with the user ID as the key and all messages for the same key will be guaranteed to go to the same partition and therefore be in order for whichever consumer gets that partition. //h...@confluent.io Original message From: Abhit Kalsotra <abhit...@gmail.com> Date: 10/9/16 12:39 AM (GMT-08:00) To: users@kafka.apache.org Subject: Re: Regarding Kafka What about the order of message getting received ? If i don't mention the partition. Lets say if i have user ID :4456 and I have to do some analytics at the Kafka Consumer end and at my consumer end if its not getting consumed the way I sent, then my analytics will go haywire. Abhi On Sun, Oct 9, 2016 at 12:50 PM, Hans Jespersen <h...@confluent.io> wrote: > You don't even have to do that because the default partitioner will spread > the data you publish to the topic over the available partitions for you. > Just try it out to see. Publish multiple messages to the topic without > using keys, and without specifying a partition, and observe that they are > automatically distributed out over the available partitions. > > > //h...@confluent.io > Original message From: Abhit Kalsotra <abhit...@gmail.com> > Date: 10/8/16 11:19 PM (GMT-08:00) To: users@kafka.apache.org Subject: > Re: Regarding Kafka > Hans > > Thanks for the response, yeah you can say yeah I am treating topics like > partitions, because my > > current logic of producing to a respective topic goes something like this > > RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_ > kafkaTopic[whichTopic], > partition, > > RdKafka::Producer::RK_MSG_COPY, > ptr, > size, > > , > NULL); > where partitionKey is unique number or userID, so what I am doing currently > each partitionKey%10 > so whats so ever is the remainder, I am dumping that to the respective > topic. > > But as per your suggestion, Let me create close to 40-50 partitions for a > single topic and when i am producing I do something like this > > RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_kafkaTopic, > > partition%(50), > > RdKafka::Producer::RK_MSG_COPY, > ptr, > size, > > , > NULL); > > Abhi > > On Sun, Oct 9, 2016 at 10:13 AM, Hans Jespersen <h...@confluent.io> wrote: > > > Why do you have 10 topics? It seems like you are treating topics like > > partitions and it's unclear why you don't just have 1 topic with 10, 20, > or > > even 30 partitions. Ordering is only guaranteed at a partition level. > > > > In general if you want to capacity plan for partitions you benchmark a > > single partition and then divide your peak estimated throughput by the > > results of the single partition results. > > > > If you expect the peak throughput to increase over time then double your > > partition count to allow room to grow the number of consumers without > > having to repartition. > > > > Sizing can be a bit more tricky if you are using keys but it doesn't > sound > > like you are if today you are publishing to topics the way you describe. > > > > -hans > > > > > On Oct 8, 2016, at 9:01 PM, Abhit Kalsotra <abhit...@gmail.com> wrote: > > > > > > Guys any views ? > > > > > > Abhi > > > > > >> On Sat, Oct 8, 2016 at 4:28 PM, Abhit Kalsotra <abhit...@gmail.com> > > wrote: > > >> > > >> Hello > > >> > > >> I am using librdkafka c++ library for my application . > > >> > > >> *My Kafka Cluster Set up* > > >> 2 Kafka Zookeper running on 2 different instances > > >> 7 Kafka Brokers , 4 Running on 1 machine and 3 running on other > machine > > >> Total 10 Topics and partition count is 3 with replication factor of 3. > > >> > > >> Now in my case I need to be very specific for the *message order* > when I > > >> am consuming the messages. I know if all the messages gets produced to > > the > > >> same partition, it always gets consumed in the same order. > > >> > > >> I need expert opinions like what's the ideal partition count I should > > >> consider without effecting performance.( I am looking for close to > > 100,000 > > >> messages per seconds). > > >> The topics are from 0 to 9 and when I am producing messages I do > > something > > >> like uniqueUserId % 10 , and then pointing to a respective topic like > 0 > > || > > >> 1 || 2 etc.. > > >> > > >> Abhi > > >> > > >> > > >> > > >> > > >> -- > > >> If you can't succeed, call it version 1.0 > > >> > > > > > > > > > > > > -- > > > If you can't succeed, call it version 1.0 > > > > > > -- > If you can't succeed, call it version 1.0 > -- If you can't succeed, call it version 1.0
Re: Regarding Kafka
What about the order of message getting received ? If i don't mention the partition. Lets say if i have user ID :4456 and I have to do some analytics at the Kafka Consumer end and at my consumer end if its not getting consumed the way I sent, then my analytics will go haywire. Abhi On Sun, Oct 9, 2016 at 12:50 PM, Hans Jespersen <h...@confluent.io> wrote: > You don't even have to do that because the default partitioner will spread > the data you publish to the topic over the available partitions for you. > Just try it out to see. Publish multiple messages to the topic without > using keys, and without specifying a partition, and observe that they are > automatically distributed out over the available partitions. > > > //h...@confluent.io > Original message From: Abhit Kalsotra <abhit...@gmail.com> > Date: 10/8/16 11:19 PM (GMT-08:00) To: users@kafka.apache.org Subject: > Re: Regarding Kafka > Hans > > Thanks for the response, yeah you can say yeah I am treating topics like > partitions, because my > > current logic of producing to a respective topic goes something like this > > RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_ > kafkaTopic[whichTopic], > partition, > > RdKafka::Producer::RK_MSG_COPY, > ptr, > size, > > , > NULL); > where partitionKey is unique number or userID, so what I am doing currently > each partitionKey%10 > so whats so ever is the remainder, I am dumping that to the respective > topic. > > But as per your suggestion, Let me create close to 40-50 partitions for a > single topic and when i am producing I do something like this > > RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_kafkaTopic, > > partition%(50), > > RdKafka::Producer::RK_MSG_COPY, > ptr, > size, > > , > NULL); > > Abhi > > On Sun, Oct 9, 2016 at 10:13 AM, Hans Jespersen <h...@confluent.io> wrote: > > > Why do you have 10 topics? It seems like you are treating topics like > > partitions and it's unclear why you don't just have 1 topic with 10, 20, > or > > even 30 partitions. Ordering is only guaranteed at a partition level. > > > > In general if you want to capacity plan for partitions you benchmark a > > single partition and then divide your peak estimated throughput by the > > results of the single partition results. > > > > If you expect the peak throughput to increase over time then double your > > partition count to allow room to grow the number of consumers without > > having to repartition. > > > > Sizing can be a bit more tricky if you are using keys but it doesn't > sound > > like you are if today you are publishing to topics the way you describe. > > > > -hans > > > > > On Oct 8, 2016, at 9:01 PM, Abhit Kalsotra <abhit...@gmail.com> wrote: > > > > > > Guys any views ? > > > > > > Abhi > > > > > >> On Sat, Oct 8, 2016 at 4:28 PM, Abhit Kalsotra <abhit...@gmail.com> > > wrote: > > >> > > >> Hello > > >> > > >> I am using librdkafka c++ library for my application . > > >> > > >> *My Kafka Cluster Set up* > > >> 2 Kafka Zookeper running on 2 different instances > > >> 7 Kafka Brokers , 4 Running on 1 machine and 3 running on other > machine > > >> Total 10 Topics and partition count is 3 with replication factor of 3. > > >> > > >> Now in my case I need to be very specific for the *message order* > when I > > >> am consuming the messages. I know if all the messages gets produced to > > the > > >> same partition, it always gets consumed in the same order. > > >> > > >> I need expert opinions like what's the ideal partition count I should > > >> consider without effecting performance.( I am looking for close to > > 100,000 > > >> messages per seconds). > > >> The topics are from 0 to 9 and when I am producing messages I do > > something > > >> like uniqueUserId % 10 , and then pointing to a respective topic like > 0 > > || > > >> 1 || 2 etc.. > > >> > > >> Abhi > > >> > > >> > > >> > > >> > > >> -- > > >> If you can't succeed, call it version 1.0 > > >> > > > > > > > > > > > > -- > > > If you can't succeed, call it version 1.0 > > > > > > -- > If you can't succeed, call it version 1.0 > -- If you can't succeed, call it version 1.0
Re: Regarding Kafka
You don't even have to do that because the default partitioner will spread the data you publish to the topic over the available partitions for you. Just try it out to see. Publish multiple messages to the topic without using keys, and without specifying a partition, and observe that they are automatically distributed out over the available partitions. //h...@confluent.io Original message From: Abhit Kalsotra <abhit...@gmail.com> Date: 10/8/16 11:19 PM (GMT-08:00) To: users@kafka.apache.org Subject: Re: Regarding Kafka Hans Thanks for the response, yeah you can say yeah I am treating topics like partitions, because my current logic of producing to a respective topic goes something like this RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_kafkaTopic[whichTopic], partition, RdKafka::Producer::RK_MSG_COPY, ptr, size, , NULL); where partitionKey is unique number or userID, so what I am doing currently each partitionKey%10 so whats so ever is the remainder, I am dumping that to the respective topic. But as per your suggestion, Let me create close to 40-50 partitions for a single topic and when i am producing I do something like this RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_kafkaTopic, partition%(50), RdKafka::Producer::RK_MSG_COPY, ptr, size, , NULL); Abhi On Sun, Oct 9, 2016 at 10:13 AM, Hans Jespersen <h...@confluent.io> wrote: > Why do you have 10 topics? It seems like you are treating topics like > partitions and it's unclear why you don't just have 1 topic with 10, 20, or > even 30 partitions. Ordering is only guaranteed at a partition level. > > In general if you want to capacity plan for partitions you benchmark a > single partition and then divide your peak estimated throughput by the > results of the single partition results. > > If you expect the peak throughput to increase over time then double your > partition count to allow room to grow the number of consumers without > having to repartition. > > Sizing can be a bit more tricky if you are using keys but it doesn't sound > like you are if today you are publishing to topics the way you describe. > > -hans > > > On Oct 8, 2016, at 9:01 PM, Abhit Kalsotra <abhit...@gmail.com> wrote: > > > > Guys any views ? > > > > Abhi > > > >> On Sat, Oct 8, 2016 at 4:28 PM, Abhit Kalsotra <abhit...@gmail.com> > wrote: > >> > >> Hello > >> > >> I am using librdkafka c++ library for my application . > >> > >> *My Kafka Cluster Set up* > >> 2 Kafka Zookeper running on 2 different instances > >> 7 Kafka Brokers , 4 Running on 1 machine and 3 running on other machine > >> Total 10 Topics and partition count is 3 with replication factor of 3. > >> > >> Now in my case I need to be very specific for the *message order* when I > >> am consuming the messages. I know if all the messages gets produced to > the > >> same partition, it always gets consumed in the same order. > >> > >> I need expert opinions like what's the ideal partition count I should > >> consider without effecting performance.( I am looking for close to > 100,000 > >> messages per seconds). > >> The topics are from 0 to 9 and when I am producing messages I do > something > >> like uniqueUserId % 10 , and then pointing to a respective topic like 0 > || > >> 1 || 2 etc.. > >> > >> Abhi > >> > >> > >> > >> > >> -- > >> If you can't succeed, call it version 1.0 > >> > > > > > > > > -- > > If you can't succeed, call it version 1.0 > -- If you can't succeed, call it version 1.0
Re: Regarding Kafka
Hans Thanks for the response, yeah you can say yeah I am treating topics like partitions, because my current logic of producing to a respective topic goes something like this RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_kafkaTopic[whichTopic], partition, RdKafka::Producer::RK_MSG_COPY, ptr, size, , NULL); where partitionKey is unique number or userID, so what I am doing currently each partitionKey%10 so whats so ever is the remainder, I am dumping that to the respective topic. But as per your suggestion, Let me create close to 40-50 partitions for a single topic and when i am producing I do something like this RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_kafkaTopic, partition%(50), RdKafka::Producer::RK_MSG_COPY, ptr, size, , NULL); Abhi On Sun, Oct 9, 2016 at 10:13 AM, Hans Jespersenwrote: > Why do you have 10 topics? It seems like you are treating topics like > partitions and it's unclear why you don't just have 1 topic with 10, 20, or > even 30 partitions. Ordering is only guaranteed at a partition level. > > In general if you want to capacity plan for partitions you benchmark a > single partition and then divide your peak estimated throughput by the > results of the single partition results. > > If you expect the peak throughput to increase over time then double your > partition count to allow room to grow the number of consumers without > having to repartition. > > Sizing can be a bit more tricky if you are using keys but it doesn't sound > like you are if today you are publishing to topics the way you describe. > > -hans > > > On Oct 8, 2016, at 9:01 PM, Abhit Kalsotra wrote: > > > > Guys any views ? > > > > Abhi > > > >> On Sat, Oct 8, 2016 at 4:28 PM, Abhit Kalsotra > wrote: > >> > >> Hello > >> > >> I am using librdkafka c++ library for my application . > >> > >> *My Kafka Cluster Set up* > >> 2 Kafka Zookeper running on 2 different instances > >> 7 Kafka Brokers , 4 Running on 1 machine and 3 running on other machine > >> Total 10 Topics and partition count is 3 with replication factor of 3. > >> > >> Now in my case I need to be very specific for the *message order* when I > >> am consuming the messages. I know if all the messages gets produced to > the > >> same partition, it always gets consumed in the same order. > >> > >> I need expert opinions like what's the ideal partition count I should > >> consider without effecting performance.( I am looking for close to > 100,000 > >> messages per seconds). > >> The topics are from 0 to 9 and when I am producing messages I do > something > >> like uniqueUserId % 10 , and then pointing to a respective topic like 0 > || > >> 1 || 2 etc.. > >> > >> Abhi > >> > >> > >> > >> > >> -- > >> If you can't succeed, call it version 1.0 > >> > > > > > > > > -- > > If you can't succeed, call it version 1.0 > -- If you can't succeed, call it version 1.0
Re: Regarding Kafka
Why do you have 10 topics? It seems like you are treating topics like partitions and it's unclear why you don't just have 1 topic with 10, 20, or even 30 partitions. Ordering is only guaranteed at a partition level. In general if you want to capacity plan for partitions you benchmark a single partition and then divide your peak estimated throughput by the results of the single partition results. If you expect the peak throughput to increase over time then double your partition count to allow room to grow the number of consumers without having to repartition. Sizing can be a bit more tricky if you are using keys but it doesn't sound like you are if today you are publishing to topics the way you describe. -hans > On Oct 8, 2016, at 9:01 PM, Abhit Kalsotrawrote: > > Guys any views ? > > Abhi > >> On Sat, Oct 8, 2016 at 4:28 PM, Abhit Kalsotra wrote: >> >> Hello >> >> I am using librdkafka c++ library for my application . >> >> *My Kafka Cluster Set up* >> 2 Kafka Zookeper running on 2 different instances >> 7 Kafka Brokers , 4 Running on 1 machine and 3 running on other machine >> Total 10 Topics and partition count is 3 with replication factor of 3. >> >> Now in my case I need to be very specific for the *message order* when I >> am consuming the messages. I know if all the messages gets produced to the >> same partition, it always gets consumed in the same order. >> >> I need expert opinions like what's the ideal partition count I should >> consider without effecting performance.( I am looking for close to 100,000 >> messages per seconds). >> The topics are from 0 to 9 and when I am producing messages I do something >> like uniqueUserId % 10 , and then pointing to a respective topic like 0 || >> 1 || 2 etc.. >> >> Abhi >> >> >> >> >> -- >> If you can't succeed, call it version 1.0 >> > > > > -- > If you can't succeed, call it version 1.0
Re: Regarding Kafka
Guys any views ? Abhi On Sat, Oct 8, 2016 at 4:28 PM, Abhit Kalsotrawrote: > Hello > > I am using librdkafka c++ library for my application . > > *My Kafka Cluster Set up* > 2 Kafka Zookeper running on 2 different instances > 7 Kafka Brokers , 4 Running on 1 machine and 3 running on other machine > Total 10 Topics and partition count is 3 with replication factor of 3. > > Now in my case I need to be very specific for the *message order* when I > am consuming the messages. I know if all the messages gets produced to the > same partition, it always gets consumed in the same order. > > I need expert opinions like what's the ideal partition count I should > consider without effecting performance.( I am looking for close to 100,000 > messages per seconds). > The topics are from 0 to 9 and when I am producing messages I do something > like uniqueUserId % 10 , and then pointing to a respective topic like 0 || > 1 || 2 etc.. > > Abhi > > > > > -- > If you can't succeed, call it version 1.0 > -- If you can't succeed, call it version 1.0
RE: Regarding kafka partition and replication
Having multiple brokers on the same node has a couple of problems for a production installation: 1. You'll have multiple brokers contending for disk and memory resources 2. You could have your partitions replicated to the same node which means if that node fails you would lose data. I think you are better off having 3 nodes with 3 brokers. You can keep with 9 partitions in case you want to add physical nodes in the future and have a replication factor of 2 or 3. -Dave Dave Tauzell | Senior Software Engineer | Surescripts O: 651.855.3042 | www.surescripts.com | dave.tauz...@surescripts.com Connect with us: Twitter I LinkedIn I Facebook I YouTube -Original Message- From: Amit K [mailto:amitk@gmail.com] Sent: Monday, July 18, 2016 8:55 PM To: users@kafka.apache.org Subject: Regarding kafka partition and replication Hi, I have 3 nodes, each with 3 brokers, Kafka cluster along with 3 zookeeper cluster. So total 9 brokers spread across 3 different machines. I am adhered to Kafka 0.9. In order to optimally use the infrastructure for 2 topics (as of now, is not expected to grow drastically in near future), I am thinking of having 9 partitions with 3 (or 6?) replication factor. Will this help me having good distribution of partitions and replicas across brokers? System does not have hugh load (<50 requests/sec of less than 1 kb load each) as of now and is neither expected to get higher load than this. If this replication and partition does not help, please suggest a better topic partition and replication strategy. Also please guide me with any articles or documents about setting up multi node Kafka cluster in regard to partition, replication, general properties to be used (a kind of good practice etc.) Thanks, Amit This e-mail and any files transmitted with it are confidential, may contain sensitive information, and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error, please notify the sender by reply e-mail immediately and destroy all copies of the e-mail and any attachments.
Re: Regarding Kafka Log compaction Features
hi! please have a look at this article. it help me touse the log compaction feature mechanism i hope thtat it helps. regards, florin http://www.shayne.me/blog/2015/2015-06-25-everything-about-kafka-part-2/ On Thursday, May 5, 2016, Behera, Himansu (Contractor) < himansu_beh...@cable.comcast.com> wrote: > Hi Team, > > > > I am working on implementing the kafka log compaction feature in my > project. > > > > Please find the server. Properties. I have made all the config changes > needed/suggested in the kafka log compaction forum.But was not to able to > resolve the issue. > > > > My use as follows: > > > > Step 1:.We send a keyed message(String,string) from one of the producer > to the topic. > > Step 2:Then we send around 10 million keyed messages(with unique key) to > the above said topic. > > Step 3:Then we try to send update to the key in step 1 with some other > value other than in step 1 after 1800 secs > > > > Expected Result: The key should be updated with the recent value. > > Actual Result: The updated key contains the old value. > > > > > > Appreciate if someone can help me in implementing the log compaction > features POC. > > > > Please find the server. properties attached for your reference. > > > > Regards, > > Himansu > > >
Re: regarding Kafka Queing system
You can fetch messages by offset. https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-FetchRequest On Fri, Feb 26, 2016 at 7:23 AM rahul shuklawrote: > Hello, > I am working SNMP trap parsing project in my acadmic. i am using kafka > message system in my project. Actully i want to store trap object in > kafka which is getting from snmp agent, and retrieve that object on > another side for further processing. > So, my query is , IS there any way to store a particular Event in > kafka and retrieve that event on second side for further processing > ?. > > Please assist me. > > > Thanks & Regards > Rahul Shukla > +91-9826277980 >
Re: Regarding Kafka release 0.8.2-beta
shouldn't the new consumer api be removed from the 0.8.2 code base then? On Fri, Jan 23, 2015 at 10:30 AM, Joe Stein joe.st...@stealth.ly wrote: The new consumer is scheduled for 0.9.0. Currently Kafka release candidate 2 for 0.8.2.0 is being voted on. There is an in progress patch to the new consumer that you can try out https://issues.apache.org/jira/browse/KAFKA-1760 /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Fri, Jan 23, 2015 at 1:55 AM, Reeni Mathew reenimathew...@gmail.com wrote: Hi Team, I was playing around with your recent release 0.8.2-beta. Producer worked fine whereas new consumer did not. org.apache.kafka.clients.consumer.KafkaConsumer After digging the code I realized that the implementation for the same is not available. Only API is present. Could you please let me know by when we can expect the implementation of the same. Thanks Regards Reeni
Re: Regarding Kafka release 0.8.2-beta
The new consumer api is actually excluded from the javadoc that we generate. Thanks, Jun On Mon, Jan 26, 2015 at 11:54 AM, Jason Rosenberg j...@squareup.com wrote: shouldn't the new consumer api be removed from the 0.8.2 code base then? On Fri, Jan 23, 2015 at 10:30 AM, Joe Stein joe.st...@stealth.ly wrote: The new consumer is scheduled for 0.9.0. Currently Kafka release candidate 2 for 0.8.2.0 is being voted on. There is an in progress patch to the new consumer that you can try out https://issues.apache.org/jira/browse/KAFKA-1760 /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Fri, Jan 23, 2015 at 1:55 AM, Reeni Mathew reenimathew...@gmail.com wrote: Hi Team, I was playing around with your recent release 0.8.2-beta. Producer worked fine whereas new consumer did not. org.apache.kafka.clients.consumer.KafkaConsumer After digging the code I realized that the implementation for the same is not available. Only API is present. Could you please let me know by when we can expect the implementation of the same. Thanks Regards Reeni
Re: Regarding Kafka release 0.8.2-beta
Matve wr should add to the documentation experimental so folks that don't know understand. /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop / On Jan 26, 2015 11:56 AM, Jason Rosenberg j...@squareup.com wrote: shouldn't the new consumer api be removed from the 0.8.2 code base then? On Fri, Jan 23, 2015 at 10:30 AM, Joe Stein joe.st...@stealth.ly wrote: The new consumer is scheduled for 0.9.0. Currently Kafka release candidate 2 for 0.8.2.0 is being voted on. There is an in progress patch to the new consumer that you can try out https://issues.apache.org/jira/browse/KAFKA-1760 /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Fri, Jan 23, 2015 at 1:55 AM, Reeni Mathew reenimathew...@gmail.com wrote: Hi Team, I was playing around with your recent release 0.8.2-beta. Producer worked fine whereas new consumer did not. org.apache.kafka.clients.consumer.KafkaConsumer After digging the code I realized that the implementation for the same is not available. Only API is present. Could you please let me know by when we can expect the implementation of the same. Thanks Regards Reeni