Re: Regarding Kafka connect task to partition relationship for both source and sink connectors

2024-05-30 Thread Alex Craig
For sink connectors, I believe you can scale up the tasks to match the
partitions on the topic.  But I don't believe this is the case for source
connectors; the number of partitions on the topic you're producing to has
nothing to do with the number of connector tasks.  It really depends on the
individual source connector and if the source data-type could benefit from
multiple tasks.  For example, the JDBC source connector (a very popular
connector) only supports 1 task - even if you're querying multiple tables.

Bottom line: you'll need to check the documentation for the connector in
question to see if it supports multiple tasks.

Alex

On Thu, May 30, 2024 at 7:51 AM Sébastien Rebecchi 
wrote:

> Hello
>
> Confirmed. Partition is the minimal granularity level, so having more
> consumers than the number of partitions of a topic for a same consumer
> group is useless, having P partitions means maximum parallelism is reached
> using P consumers.
>
> Regards,
>
> Sébastien.
>
> Le jeu. 30 mai 2024 à 14:43, Yeikel Santana  a écrit :
>
> > Hi everyone,
> >
> >
> > From my understanding, if a topic has  n partitions, we can create up to
> n
> > tasks for both the source and sink connectors to achieve the maximum
> > parallelism. Adding more tasks would not be beneficial, as they would
> > remain idle and be limited to the number of partitions of the topic
> >
> >
> > Could you please confirm if this understanding is correct?
> >
> >
> > If this understanding is incorrect could you please explain the
> > relationship if any?
> >
> >
> > Thank you!
> >
> >
> >
>


Re: Regarding Kafka connect task to partition relationship for both source and sink connectors

2024-05-30 Thread Sébastien Rebecchi
Hello

Confirmed. Partition is the minimal granularity level, so having more
consumers than the number of partitions of a topic for a same consumer
group is useless, having P partitions means maximum parallelism is reached
using P consumers.

Regards,

Sébastien.

Le jeu. 30 mai 2024 à 14:43, Yeikel Santana  a écrit :

> Hi everyone,
>
>
> From my understanding, if a topic has  n partitions, we can create up to n
> tasks for both the source and sink connectors to achieve the maximum
> parallelism. Adding more tasks would not be beneficial, as they would
> remain idle and be limited to the number of partitions of the topic
>
>
> Could you please confirm if this understanding is correct?
>
>
> If this understanding is incorrect could you please explain the
> relationship if any?
>
>
> Thank you!
>
>
>


Re: Regarding kafka 2.3.0

2022-08-25 Thread Luke Chen
1. Is kafka 2.3.0 going end of life ,If yes then what is the expected date?
-> Kafka supports last 3 releases.

REF:
https://cwiki.apache.org/confluence/display/KAFKA/Time+Based+Release+Plan#TimeBasedReleasePlan-WhatIsOurEOLPolicy
?

2. Is kafka 3.1.0 backward compatible to 2.3.0?
-> Since 2.3 to 3.1 has one major release (through 3.0), some deprecated
features are removed. You can refer to this doc for upgrade guide:
https://kafka.apache.org/documentation/#upgrade_3_1_0 , and check for
release note for each release.

Thanks
Luke

On Thu, Aug 25, 2022 at 3:42 PM Fred Bai  wrote:

> +1
> Me too, We consider upgrading Kafka to 3.X from Kafka 2.X, but don't know
> the compatibility.
>
> thx
>
> Ankit Saran  于2022年8月23日周二 22:21写道:
>
> > Hi Team,
> > We are planning to upgrade kafka version from 2.3.0 to 3.1.0 , We have
> > below queries regarding the same
> >
> > 1. Is kafka 2.3.0 going end of life ,If yes then what is the expected
> date?
> > 2. Is kafka 3.1.0 backward compatible to 2.3.0?
> >
> > Please help us with the above queries, Thanks in advance.
> >
> > Regards,
> > Ankit Saran
> >
>


Re: Regarding kafka 2.3.0

2022-08-25 Thread Fred Bai
+1
Me too, We consider upgrading Kafka to 3.X from Kafka 2.X, but don't know
the compatibility.

thx

Ankit Saran  于2022年8月23日周二 22:21写道:

> Hi Team,
> We are planning to upgrade kafka version from 2.3.0 to 3.1.0 , We have
> below queries regarding the same
>
> 1. Is kafka 2.3.0 going end of life ,If yes then what is the expected date?
> 2. Is kafka 3.1.0 backward compatible to 2.3.0?
>
> Please help us with the above queries, Thanks in advance.
>
> Regards,
> Ankit Saran
>


Re: Regarding Kafka Consumer

2017-11-24 Thread simarpreet kaur
Thanks, Faraz.

I am using its Java API. It does not seem to provide such method to the
consumer.

On Wed, Nov 22, 2017 at 2:45 PM, Faraz Mateen  wrote:

> Not sure which client you are using.
> In kafka-python, consumer.config returns a dictionary with all consumer
> properties.
>
> Thanks,
> Faraz
>
> On Mon, Nov 20, 2017 at 5:34 PM, simarpreet kaur 
> wrote:
>
>> Hello team,
>>
>> I wanted to know if there is some way I can retrieve consumer properties
>> from the Kafka Consumer. for example, if at runtime, I want to know the
>> group id of a particular consumer, in case multiple consumers are running
>> in my application.
>>
>> Thanks & Regards,
>> Simarpreet
>>
>
>


Re: Regarding kafka-manager topics parameters

2017-10-18 Thread Ted Yu
Images didn't come thru.

Consider using third party website.

On Tue, Oct 17, 2017 at 9:36 PM, Pavan Patani 
wrote:

> Hello,
>
> Previously I was using old version of Kafka-manager and it was showing
> "Producer Message/Sec and Summed Recent Offsets" parameters in topics as
> below.
>
> [image: Inline image 1]
> Currently I have installed  kafka-manager-1.3.3.14 and now I can not see
> these two "Producer Message/Sec and Summed Recent Offsets" parameters  in
> topics as below.
>
> [image: Inline image 2]
>
> Could you please guide me to add these two columns.
>
> Regards,
> Pavan Patani
>
>
>


Re: Regarding Kafka

2016-10-09 Thread Abhit Kalsotra
Yeah that I realized and later read that in Java Kafka consumer there is
one thread, that's why such behavior does not arise there. May be if I need
to restrict my application to a single threaded :( in order to achieve
that.. I need to ask Magnus Edenhill who is librdkafka expert..
Thanks for your time Hans

Abhi

On Sun, Oct 9, 2016 at 10:08 PM, Hans Jespersen <h...@confluent.io> wrote:

> I'm pretty sure Jun was talking about the Java API in the quoted blog
> text, not librdkafka. There is only one thread in the new Java consumer so
> you wouldn't see this behavior. I do not think that librdkafka makes any
> such guarantee to dispatch unique keys to each thread but I'm not an expert
> in librdkafka so others may be about to help you better on that.
> //h...@confluent.io
>  Original message From: Abhit Kalsotra <abhit...@gmail.com>
> Date: 10/9/16  3:58 AM  (GMT-08:00) To: users@kafka.apache.org Subject:
> Re: Regarding Kafka
> I did that but i am getting confusing results
>
> e.g
>
> I have created 4 Kafka Consumer threads for doing data analytic, these
> threads just wait for Kafka messages to get consumed and
> I have provided the key provided when I produce, it means that all the
> messages will go to one single partition ref "
> http://www.confluent.io/blog/how-to-choose-the-number-of-
> topicspartitions-in-a-kafka-cluster/
> "
> "* On the consumer side, Kafka always gives a single partition’s data to
> one consumer thread.*"
>
> If you see my application logs, my 4 Kafka Consumer Application threads
> which are calling consume() , Arn't all message of a particular ID should
> be consumed by one Kafka Application thread ?
>
> [2016-10-08 23:37:07.498]AxThreadId 23516 ->ID:4495 offset: 74 ][ID ID
> date:2016-09-28 20:07:32.000 ]
> [2016-10-08 23:37:07.498]AxThreadId 2208 ->ID:4496 offset: 80 ][ID ID
> date: 2016-09-28 20:07:39.000 ]
> [2016-10-08 23:37:07.498]AxThreadId 2208 ->ID:4495 offset: 77 ][ID
> date: 2016-09-28 20:07:35.000 ]
> [2016-10-08 23:37:07.498]AxThreadId 23516 ->ID:4495 offset: 76][ID
> date: 2016-09-28 20:07:34.000 ]
> [2016-10-08 23:37:07.498]AxThreadId 9540 ->ID:4495 offset: 75 ][ID
> date: 2016-09-28 20:07:33.000 ]
> [2016-10-08 23:37:07.499]AxThreadId 23516 ->ID:4495 offset: 78 ][ID
> date: 2016-09-28 20:07:36.000 ]
> [2016-10-08 23:37:07.499]AxThreadId 2208 ->ID:4495 offset: 79 ][ID
> date: 2016-09-28 20:07:37.000 ]
> [2016-10-08 23:37:07.499]AxThreadId 9540 ->ID:4495 offset: 80 ][ID
> date: 2016-09-28 20:07:38.000 ]
> [2016-10-08 23:37:07.500]AxThreadId 23516 ->ID:4495 offset: 81][ID
> date: 2016-09-28 20:07:39.000 ]
>
>
>
>
> On Sun, Oct 9, 2016 at 1:31 PM, Hans Jespersen <h...@confluent.io> wrote:
>
> > Then publish with the user ID as the key and all messages for the same
> key
> > will be guaranteed to go to the same partition and therefore be in order
> > for whichever consumer gets that partition.
> >
> >
> > //h...@confluent.io
> >  Original message From: Abhit Kalsotra <
> abhit...@gmail.com>
> > Date: 10/9/16  12:39 AM  (GMT-08:00) To: users@kafka.apache.org Subject:
> > Re: Regarding Kafka
> > What about the order of message getting received ? If i don't mention the
> > partition.
> >
> > Lets say if i have user ID :4456 and I have to do some analytics at the
> > Kafka Consumer end and at my consumer end if its not getting consumed the
> > way I sent, then my analytics will go haywire.
> >
> > Abhi
> >
> > On Sun, Oct 9, 2016 at 12:50 PM, Hans Jespersen <h...@confluent.io>
> wrote:
> >
> > > You don't even have to do that because the default partitioner will
> > spread
> > > the data you publish to the topic over the available partitions for
> you.
> > > Just try it out to see. Publish multiple messages to the topic without
> > > using keys, and without specifying a partition, and observe that they
> are
> > > automatically distributed out over the available partitions.
> > >
> > >
> > > //h...@confluent.io
> > >  Original message From: Abhit Kalsotra <
> > abhit...@gmail.com>
> > > Date: 10/8/16  11:19 PM  (GMT-08:00) To: users@kafka.apache.org
> Subject:
> > > Re: Regarding Kafka
> > > Hans
> > >
> > > Thanks for the response, yeah you can say yeah I am treating topics
> like
> > > partitions, because my
> > >
> > > current logic of producing to a respective topic goes something like
> this
> > >
> > > RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_
&

Re: Regarding Kafka

2016-10-09 Thread Hans Jespersen
I'm pretty sure Jun was talking about the Java API in the quoted blog text, not 
librdkafka. There is only one thread in the new Java consumer so you wouldn't 
see this behavior. I do not think that librdkafka makes any such guarantee to 
dispatch unique keys to each thread but I'm not an expert in librdkafka so 
others may be about to help you better on that. 
//h...@confluent.io
 Original message From: Abhit Kalsotra <abhit...@gmail.com> 
Date: 10/9/16  3:58 AM  (GMT-08:00) To: users@kafka.apache.org Subject: Re: 
Regarding Kafka 
I did that but i am getting confusing results

e.g

I have created 4 Kafka Consumer threads for doing data analytic, these
threads just wait for Kafka messages to get consumed and
I have provided the key provided when I produce, it means that all the
messages will go to one single partition ref "
http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
"
"* On the consumer side, Kafka always gives a single partition’s data to
one consumer thread.*"

If you see my application logs, my 4 Kafka Consumer Application threads
which are calling consume() , Arn't all message of a particular ID should
be consumed by one Kafka Application thread ?

[2016-10-08 23:37:07.498]AxThreadId 23516 ->ID:4495 offset: 74 ][ID ID
date:2016-09-28 20:07:32.000 ]
[2016-10-08 23:37:07.498]AxThreadId 2208 ->ID:4496 offset: 80 ][ID ID
date: 2016-09-28 20:07:39.000 ]
[2016-10-08 23:37:07.498]AxThreadId 2208 ->ID:4495 offset: 77 ][ID
date: 2016-09-28 20:07:35.000 ]
[2016-10-08 23:37:07.498]AxThreadId 23516 ->ID:4495 offset: 76][ID
date: 2016-09-28 20:07:34.000 ]
[2016-10-08 23:37:07.498]AxThreadId 9540 ->ID:4495 offset: 75 ][ID
date: 2016-09-28 20:07:33.000 ]
[2016-10-08 23:37:07.499]AxThreadId 23516 ->ID:4495 offset: 78 ][ID
date: 2016-09-28 20:07:36.000 ]
[2016-10-08 23:37:07.499]AxThreadId 2208 ->ID:4495 offset: 79 ][ID
date: 2016-09-28 20:07:37.000 ]
[2016-10-08 23:37:07.499]AxThreadId 9540 ->ID:4495 offset: 80 ][ID
date: 2016-09-28 20:07:38.000 ]
[2016-10-08 23:37:07.500]AxThreadId 23516 ->ID:4495 offset: 81][ID
date: 2016-09-28 20:07:39.000 ]




On Sun, Oct 9, 2016 at 1:31 PM, Hans Jespersen <h...@confluent.io> wrote:

> Then publish with the user ID as the key and all messages for the same key
> will be guaranteed to go to the same partition and therefore be in order
> for whichever consumer gets that partition.
>
>
> //h...@confluent.io
>  Original message From: Abhit Kalsotra <abhit...@gmail.com>
> Date: 10/9/16  12:39 AM  (GMT-08:00) To: users@kafka.apache.org Subject:
> Re: Regarding Kafka
> What about the order of message getting received ? If i don't mention the
> partition.
>
> Lets say if i have user ID :4456 and I have to do some analytics at the
> Kafka Consumer end and at my consumer end if its not getting consumed the
> way I sent, then my analytics will go haywire.
>
> Abhi
>
> On Sun, Oct 9, 2016 at 12:50 PM, Hans Jespersen <h...@confluent.io> wrote:
>
> > You don't even have to do that because the default partitioner will
> spread
> > the data you publish to the topic over the available partitions for you.
> > Just try it out to see. Publish multiple messages to the topic without
> > using keys, and without specifying a partition, and observe that they are
> > automatically distributed out over the available partitions.
> >
> >
> > //h...@confluent.io
> >  Original message From: Abhit Kalsotra <
> abhit...@gmail.com>
> > Date: 10/8/16  11:19 PM  (GMT-08:00) To: users@kafka.apache.org Subject:
> > Re: Regarding Kafka
> > Hans
> >
> > Thanks for the response, yeah you can say yeah I am treating topics like
> > partitions, because my
> >
> > current logic of producing to a respective topic goes something like this
> >
> > RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_
> > kafkaTopic[whichTopic],
> >
> partition,
> >
> > RdKafka::Producer::RK_MSG_COPY,
> > ptr,
> > size,
> >
> > ,
> > NULL);
> > where partitionKey is unique number or userID, so what I am doing
> currently
> > each partitionKey%10
> > so whats so ever is the remainder, I am dumping that to the respective
> > topic.
> >
> > But as per your suggestion, Let me create close to 40-50 partitions for a
> > single topic and when i am producing I do something like this
> >
> > RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_kafkaTopic,
> >
> > partition%(50),
> &

Re: Regarding Kafka

2016-10-09 Thread Abhit Kalsotra
I did that but i am getting confusing results

e.g

I have created 4 Kafka Consumer threads for doing data analytic, these
threads just wait for Kafka messages to get consumed and
I have provided the key provided when I produce, it means that all the
messages will go to one single partition ref "
http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
"
"* On the consumer side, Kafka always gives a single partition’s data to
one consumer thread.*"

If you see my application logs, my 4 Kafka Consumer Application threads
which are calling consume() , Arn't all message of a particular ID should
be consumed by one Kafka Application thread ?

[2016-10-08 23:37:07.498]AxThreadId 23516 ->ID:4495 offset: 74 ][ID ID
date:2016-09-28 20:07:32.000 ]
[2016-10-08 23:37:07.498]AxThreadId 2208 ->ID:4496 offset: 80 ][ID ID
date: 2016-09-28 20:07:39.000 ]
[2016-10-08 23:37:07.498]AxThreadId 2208 ->ID:4495 offset: 77 ][ID
date: 2016-09-28 20:07:35.000 ]
[2016-10-08 23:37:07.498]AxThreadId 23516 ->ID:4495 offset: 76][ID
date: 2016-09-28 20:07:34.000 ]
[2016-10-08 23:37:07.498]AxThreadId 9540 ->ID:4495 offset: 75 ][ID
date: 2016-09-28 20:07:33.000 ]
[2016-10-08 23:37:07.499]AxThreadId 23516 ->ID:4495 offset: 78 ][ID
date: 2016-09-28 20:07:36.000 ]
[2016-10-08 23:37:07.499]AxThreadId 2208 ->ID:4495 offset: 79 ][ID
date: 2016-09-28 20:07:37.000 ]
[2016-10-08 23:37:07.499]AxThreadId 9540 ->ID:4495 offset: 80 ][ID
date: 2016-09-28 20:07:38.000 ]
[2016-10-08 23:37:07.500]AxThreadId 23516 ->ID:4495 offset: 81][ID
date: 2016-09-28 20:07:39.000 ]




On Sun, Oct 9, 2016 at 1:31 PM, Hans Jespersen <h...@confluent.io> wrote:

> Then publish with the user ID as the key and all messages for the same key
> will be guaranteed to go to the same partition and therefore be in order
> for whichever consumer gets that partition.
>
>
> //h...@confluent.io
>  Original message From: Abhit Kalsotra <abhit...@gmail.com>
> Date: 10/9/16  12:39 AM  (GMT-08:00) To: users@kafka.apache.org Subject:
> Re: Regarding Kafka
> What about the order of message getting received ? If i don't mention the
> partition.
>
> Lets say if i have user ID :4456 and I have to do some analytics at the
> Kafka Consumer end and at my consumer end if its not getting consumed the
> way I sent, then my analytics will go haywire.
>
> Abhi
>
> On Sun, Oct 9, 2016 at 12:50 PM, Hans Jespersen <h...@confluent.io> wrote:
>
> > You don't even have to do that because the default partitioner will
> spread
> > the data you publish to the topic over the available partitions for you.
> > Just try it out to see. Publish multiple messages to the topic without
> > using keys, and without specifying a partition, and observe that they are
> > automatically distributed out over the available partitions.
> >
> >
> > //h...@confluent.io
> >  Original message From: Abhit Kalsotra <
> abhit...@gmail.com>
> > Date: 10/8/16  11:19 PM  (GMT-08:00) To: users@kafka.apache.org Subject:
> > Re: Regarding Kafka
> > Hans
> >
> > Thanks for the response, yeah you can say yeah I am treating topics like
> > partitions, because my
> >
> > current logic of producing to a respective topic goes something like this
> >
> > RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_
> > kafkaTopic[whichTopic],
> >
> partition,
> >
> > RdKafka::Producer::RK_MSG_COPY,
> > ptr,
> > size,
> >
> > ,
> > NULL);
> > where partitionKey is unique number or userID, so what I am doing
> currently
> > each partitionKey%10
> > so whats so ever is the remainder, I am dumping that to the respective
> > topic.
> >
> > But as per your suggestion, Let me create close to 40-50 partitions for a
> > single topic and when i am producing I do something like this
> >
> > RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_kafkaTopic,
> >
> > partition%(50),
> >
> > RdKafka::Producer::RK_MSG_COPY,
> > ptr,
> > size,
> >
> > ,
> > NULL);
> >
> > Abhi
> >
> > On Sun, Oct 9, 2016 at 10:13 AM, Hans Jespersen <h...@confluent.io>
> wrote:
> >
> > > Why do you have 10 topics?  It seems like you are treating topics like
> > > parti

Re: Regarding Kafka

2016-10-09 Thread Hans Jespersen
Then publish with the user ID as the key and all messages for the same key will 
be guaranteed to go to the same partition and therefore be in order for 
whichever consumer gets that partition.


//h...@confluent.io
 Original message From: Abhit Kalsotra <abhit...@gmail.com> 
Date: 10/9/16  12:39 AM  (GMT-08:00) To: users@kafka.apache.org Subject: Re: 
Regarding Kafka 
What about the order of message getting received ? If i don't mention the
partition.

Lets say if i have user ID :4456 and I have to do some analytics at the
Kafka Consumer end and at my consumer end if its not getting consumed the
way I sent, then my analytics will go haywire.

Abhi

On Sun, Oct 9, 2016 at 12:50 PM, Hans Jespersen <h...@confluent.io> wrote:

> You don't even have to do that because the default partitioner will spread
> the data you publish to the topic over the available partitions for you.
> Just try it out to see. Publish multiple messages to the topic without
> using keys, and without specifying a partition, and observe that they are
> automatically distributed out over the available partitions.
>
>
> //h...@confluent.io
>  Original message From: Abhit Kalsotra <abhit...@gmail.com>
> Date: 10/8/16  11:19 PM  (GMT-08:00) To: users@kafka.apache.org Subject:
> Re: Regarding Kafka
> Hans
>
> Thanks for the response, yeah you can say yeah I am treating topics like
> partitions, because my
>
> current logic of producing to a respective topic goes something like this
>
> RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_
> kafkaTopic[whichTopic],
> partition,
>
> RdKafka::Producer::RK_MSG_COPY,
> ptr,
> size,
>
> ,
> NULL);
> where partitionKey is unique number or userID, so what I am doing currently
> each partitionKey%10
> so whats so ever is the remainder, I am dumping that to the respective
> topic.
>
> But as per your suggestion, Let me create close to 40-50 partitions for a
> single topic and when i am producing I do something like this
>
> RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_kafkaTopic,
>
> partition%(50),
>
> RdKafka::Producer::RK_MSG_COPY,
> ptr,
> size,
>
> ,
> NULL);
>
> Abhi
>
> On Sun, Oct 9, 2016 at 10:13 AM, Hans Jespersen <h...@confluent.io> wrote:
>
> > Why do you have 10 topics?  It seems like you are treating topics like
> > partitions and it's unclear why you don't just have 1 topic with 10, 20,
> or
> > even 30 partitions. Ordering is only guaranteed at a partition level.
> >
> > In general if you want to capacity plan for partitions you benchmark a
> > single partition and then divide your peak estimated throughput by the
> > results of the single partition results.
> >
> > If you expect the peak throughput to increase over time then double your
> > partition count to allow room to grow the number of consumers without
> > having to repartition.
> >
> > Sizing can be a bit more tricky if you are using keys but it doesn't
> sound
> > like you are if today you are publishing to topics the way you describe.
> >
> > -hans
> >
> > > On Oct 8, 2016, at 9:01 PM, Abhit Kalsotra <abhit...@gmail.com> wrote:
> > >
> > > Guys any views ?
> > >
> > > Abhi
> > >
> > >> On Sat, Oct 8, 2016 at 4:28 PM, Abhit Kalsotra <abhit...@gmail.com>
> > wrote:
> > >>
> > >> Hello
> > >>
> > >> I am using librdkafka c++ library for my application .
> > >>
> > >> *My Kafka Cluster Set up*
> > >> 2 Kafka Zookeper running on 2 different instances
> > >> 7 Kafka Brokers , 4 Running on 1 machine and 3 running on other
> machine
> > >> Total 10 Topics and partition count is 3 with replication factor of 3.
> > >>
> > >> Now in my case I need to be very specific for the *message order*
> when I
> > >> am consuming the messages. I know if all the messages gets produced to
> > the
> > >> same partition, it always gets consumed in the same order.
> > >>
> > >> I need expert opinions like what's the ideal partition count I should
> > >> consider without effecting performance.( I am looking for close to
> > 100,000
> > >> messages per seconds).
> > >> The topics are from 0 to 9 and when I am producing messages I do
> > something
> > >> like uniqueUserId % 10 , and then pointing to a respective topic like
> 0
> > ||
> > >> 1 || 2 etc..
> > >>
> > >> Abhi
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> If you can't succeed, call it version 1.0
> > >>
> > >
> > >
> > >
> > > --
> > > If you can't succeed, call it version 1.0
> >
>
>
>
> --
> If you can't succeed, call it version 1.0
>



-- 
If you can't succeed, call it version 1.0


Re: Regarding Kafka

2016-10-09 Thread Abhit Kalsotra
What about the order of message getting received ? If i don't mention the
partition.

Lets say if i have user ID :4456 and I have to do some analytics at the
Kafka Consumer end and at my consumer end if its not getting consumed the
way I sent, then my analytics will go haywire.

Abhi

On Sun, Oct 9, 2016 at 12:50 PM, Hans Jespersen <h...@confluent.io> wrote:

> You don't even have to do that because the default partitioner will spread
> the data you publish to the topic over the available partitions for you.
> Just try it out to see. Publish multiple messages to the topic without
> using keys, and without specifying a partition, and observe that they are
> automatically distributed out over the available partitions.
>
>
> //h...@confluent.io
>  Original message From: Abhit Kalsotra <abhit...@gmail.com>
> Date: 10/8/16  11:19 PM  (GMT-08:00) To: users@kafka.apache.org Subject:
> Re: Regarding Kafka
> Hans
>
> Thanks for the response, yeah you can say yeah I am treating topics like
> partitions, because my
>
> current logic of producing to a respective topic goes something like this
>
> RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_
> kafkaTopic[whichTopic],
> partition,
>
> RdKafka::Producer::RK_MSG_COPY,
> ptr,
> size,
>
> ,
> NULL);
> where partitionKey is unique number or userID, so what I am doing currently
> each partitionKey%10
> so whats so ever is the remainder, I am dumping that to the respective
> topic.
>
> But as per your suggestion, Let me create close to 40-50 partitions for a
> single topic and when i am producing I do something like this
>
> RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_kafkaTopic,
>
> partition%(50),
>
> RdKafka::Producer::RK_MSG_COPY,
> ptr,
> size,
>
> ,
> NULL);
>
> Abhi
>
> On Sun, Oct 9, 2016 at 10:13 AM, Hans Jespersen <h...@confluent.io> wrote:
>
> > Why do you have 10 topics?  It seems like you are treating topics like
> > partitions and it's unclear why you don't just have 1 topic with 10, 20,
> or
> > even 30 partitions. Ordering is only guaranteed at a partition level.
> >
> > In general if you want to capacity plan for partitions you benchmark a
> > single partition and then divide your peak estimated throughput by the
> > results of the single partition results.
> >
> > If you expect the peak throughput to increase over time then double your
> > partition count to allow room to grow the number of consumers without
> > having to repartition.
> >
> > Sizing can be a bit more tricky if you are using keys but it doesn't
> sound
> > like you are if today you are publishing to topics the way you describe.
> >
> > -hans
> >
> > > On Oct 8, 2016, at 9:01 PM, Abhit Kalsotra <abhit...@gmail.com> wrote:
> > >
> > > Guys any views ?
> > >
> > > Abhi
> > >
> > >> On Sat, Oct 8, 2016 at 4:28 PM, Abhit Kalsotra <abhit...@gmail.com>
> > wrote:
> > >>
> > >> Hello
> > >>
> > >> I am using librdkafka c++ library for my application .
> > >>
> > >> *My Kafka Cluster Set up*
> > >> 2 Kafka Zookeper running on 2 different instances
> > >> 7 Kafka Brokers , 4 Running on 1 machine and 3 running on other
> machine
> > >> Total 10 Topics and partition count is 3 with replication factor of 3.
> > >>
> > >> Now in my case I need to be very specific for the *message order*
> when I
> > >> am consuming the messages. I know if all the messages gets produced to
> > the
> > >> same partition, it always gets consumed in the same order.
> > >>
> > >> I need expert opinions like what's the ideal partition count I should
> > >> consider without effecting performance.( I am looking for close to
> > 100,000
> > >> messages per seconds).
> > >> The topics are from 0 to 9 and when I am producing messages I do
> > something
> > >> like uniqueUserId % 10 , and then pointing to a respective topic like
> 0
> > ||
> > >> 1 || 2 etc..
> > >>
> > >> Abhi
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> If you can't succeed, call it version 1.0
> > >>
> > >
> > >
> > >
> > > --
> > > If you can't succeed, call it version 1.0
> >
>
>
>
> --
> If you can't succeed, call it version 1.0
>



-- 
If you can't succeed, call it version 1.0


Re: Regarding Kafka

2016-10-09 Thread Hans Jespersen
You don't even have to do that because the default partitioner will spread the 
data you publish to the topic over the available partitions for you. Just try 
it out to see. Publish multiple messages to the topic without using keys, and 
without specifying a partition, and observe that they are automatically 
distributed out over the available partitions.


//h...@confluent.io
 Original message From: Abhit Kalsotra <abhit...@gmail.com> 
Date: 10/8/16  11:19 PM  (GMT-08:00) To: users@kafka.apache.org Subject: Re: 
Regarding Kafka 
Hans

Thanks for the response, yeah you can say yeah I am treating topics like
partitions, because my

current logic of producing to a respective topic goes something like this

RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_kafkaTopic[whichTopic],
    partition,

RdKafka::Producer::RK_MSG_COPY,
    ptr,
    size,

,
    NULL);
where partitionKey is unique number or userID, so what I am doing currently
each partitionKey%10
so whats so ever is the remainder, I am dumping that to the respective
topic.

But as per your suggestion, Let me create close to 40-50 partitions for a
single topic and when i am producing I do something like this

RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_kafkaTopic,

partition%(50),

RdKafka::Producer::RK_MSG_COPY,
    ptr,
    size,

,
    NULL);

Abhi

On Sun, Oct 9, 2016 at 10:13 AM, Hans Jespersen <h...@confluent.io> wrote:

> Why do you have 10 topics?  It seems like you are treating topics like
> partitions and it's unclear why you don't just have 1 topic with 10, 20, or
> even 30 partitions. Ordering is only guaranteed at a partition level.
>
> In general if you want to capacity plan for partitions you benchmark a
> single partition and then divide your peak estimated throughput by the
> results of the single partition results.
>
> If you expect the peak throughput to increase over time then double your
> partition count to allow room to grow the number of consumers without
> having to repartition.
>
> Sizing can be a bit more tricky if you are using keys but it doesn't sound
> like you are if today you are publishing to topics the way you describe.
>
> -hans
>
> > On Oct 8, 2016, at 9:01 PM, Abhit Kalsotra <abhit...@gmail.com> wrote:
> >
> > Guys any views ?
> >
> > Abhi
> >
> >> On Sat, Oct 8, 2016 at 4:28 PM, Abhit Kalsotra <abhit...@gmail.com>
> wrote:
> >>
> >> Hello
> >>
> >> I am using librdkafka c++ library for my application .
> >>
> >> *My Kafka Cluster Set up*
> >> 2 Kafka Zookeper running on 2 different instances
> >> 7 Kafka Brokers , 4 Running on 1 machine and 3 running on other machine
> >> Total 10 Topics and partition count is 3 with replication factor of 3.
> >>
> >> Now in my case I need to be very specific for the *message order* when I
> >> am consuming the messages. I know if all the messages gets produced to
> the
> >> same partition, it always gets consumed in the same order.
> >>
> >> I need expert opinions like what's the ideal partition count I should
> >> consider without effecting performance.( I am looking for close to
> 100,000
> >> messages per seconds).
> >> The topics are from 0 to 9 and when I am producing messages I do
> something
> >> like uniqueUserId % 10 , and then pointing to a respective topic like 0
> ||
> >> 1 || 2 etc..
> >>
> >> Abhi
> >>
> >>
> >>
> >>
> >> --
> >> If you can't succeed, call it version 1.0
> >>
> >
> >
> >
> > --
> > If you can't succeed, call it version 1.0
>



-- 
If you can't succeed, call it version 1.0


Re: Regarding Kafka

2016-10-09 Thread Abhit Kalsotra
Hans

Thanks for the response, yeah you can say yeah I am treating topics like
partitions, because my

current logic of producing to a respective topic goes something like this

RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_kafkaTopic[whichTopic],
partition,

RdKafka::Producer::RK_MSG_COPY,
ptr,
size,

,
NULL);
where partitionKey is unique number or userID, so what I am doing currently
each partitionKey%10
so whats so ever is the remainder, I am dumping that to the respective
topic.

But as per your suggestion, Let me create close to 40-50 partitions for a
single topic and when i am producing I do something like this

RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_kafkaTopic,

partition%(50),

RdKafka::Producer::RK_MSG_COPY,
ptr,
size,

,
NULL);

Abhi

On Sun, Oct 9, 2016 at 10:13 AM, Hans Jespersen  wrote:

> Why do you have 10 topics?  It seems like you are treating topics like
> partitions and it's unclear why you don't just have 1 topic with 10, 20, or
> even 30 partitions. Ordering is only guaranteed at a partition level.
>
> In general if you want to capacity plan for partitions you benchmark a
> single partition and then divide your peak estimated throughput by the
> results of the single partition results.
>
> If you expect the peak throughput to increase over time then double your
> partition count to allow room to grow the number of consumers without
> having to repartition.
>
> Sizing can be a bit more tricky if you are using keys but it doesn't sound
> like you are if today you are publishing to topics the way you describe.
>
> -hans
>
> > On Oct 8, 2016, at 9:01 PM, Abhit Kalsotra  wrote:
> >
> > Guys any views ?
> >
> > Abhi
> >
> >> On Sat, Oct 8, 2016 at 4:28 PM, Abhit Kalsotra 
> wrote:
> >>
> >> Hello
> >>
> >> I am using librdkafka c++ library for my application .
> >>
> >> *My Kafka Cluster Set up*
> >> 2 Kafka Zookeper running on 2 different instances
> >> 7 Kafka Brokers , 4 Running on 1 machine and 3 running on other machine
> >> Total 10 Topics and partition count is 3 with replication factor of 3.
> >>
> >> Now in my case I need to be very specific for the *message order* when I
> >> am consuming the messages. I know if all the messages gets produced to
> the
> >> same partition, it always gets consumed in the same order.
> >>
> >> I need expert opinions like what's the ideal partition count I should
> >> consider without effecting performance.( I am looking for close to
> 100,000
> >> messages per seconds).
> >> The topics are from 0 to 9 and when I am producing messages I do
> something
> >> like uniqueUserId % 10 , and then pointing to a respective topic like 0
> ||
> >> 1 || 2 etc..
> >>
> >> Abhi
> >>
> >>
> >>
> >>
> >> --
> >> If you can't succeed, call it version 1.0
> >>
> >
> >
> >
> > --
> > If you can't succeed, call it version 1.0
>



-- 
If you can't succeed, call it version 1.0


Re: Regarding Kafka

2016-10-08 Thread Hans Jespersen
Why do you have 10 topics?  It seems like you are treating topics like 
partitions and it's unclear why you don't just have 1 topic with 10, 20, or 
even 30 partitions. Ordering is only guaranteed at a partition level.

In general if you want to capacity plan for partitions you benchmark a single 
partition and then divide your peak estimated throughput by the results of the 
single partition results.

If you expect the peak throughput to increase over time then double your 
partition count to allow room to grow the number of consumers without having to 
repartition.

Sizing can be a bit more tricky if you are using keys but it doesn't sound like 
you are if today you are publishing to topics the way you describe.

-hans

> On Oct 8, 2016, at 9:01 PM, Abhit Kalsotra  wrote:
> 
> Guys any views ?
> 
> Abhi
> 
>> On Sat, Oct 8, 2016 at 4:28 PM, Abhit Kalsotra  wrote:
>> 
>> Hello
>> 
>> I am using librdkafka c++ library for my application .
>> 
>> *My Kafka Cluster Set up*
>> 2 Kafka Zookeper running on 2 different instances
>> 7 Kafka Brokers , 4 Running on 1 machine and 3 running on other machine
>> Total 10 Topics and partition count is 3 with replication factor of 3.
>> 
>> Now in my case I need to be very specific for the *message order* when I
>> am consuming the messages. I know if all the messages gets produced to the
>> same partition, it always gets consumed in the same order.
>> 
>> I need expert opinions like what's the ideal partition count I should
>> consider without effecting performance.( I am looking for close to 100,000
>> messages per seconds).
>> The topics are from 0 to 9 and when I am producing messages I do something
>> like uniqueUserId % 10 , and then pointing to a respective topic like 0 ||
>> 1 || 2 etc..
>> 
>> Abhi
>> 
>> 
>> 
>> 
>> --
>> If you can't succeed, call it version 1.0
>> 
> 
> 
> 
> -- 
> If you can't succeed, call it version 1.0


Re: Regarding Kafka

2016-10-08 Thread Abhit Kalsotra
Guys any views ?

Abhi

On Sat, Oct 8, 2016 at 4:28 PM, Abhit Kalsotra  wrote:

> Hello
>
> I am using librdkafka c++ library for my application .
>
> *My Kafka Cluster Set up*
> 2 Kafka Zookeper running on 2 different instances
> 7 Kafka Brokers , 4 Running on 1 machine and 3 running on other machine
> Total 10 Topics and partition count is 3 with replication factor of 3.
>
> Now in my case I need to be very specific for the *message order* when I
> am consuming the messages. I know if all the messages gets produced to the
> same partition, it always gets consumed in the same order.
>
> I need expert opinions like what's the ideal partition count I should
> consider without effecting performance.( I am looking for close to 100,000
> messages per seconds).
> The topics are from 0 to 9 and when I am producing messages I do something
> like uniqueUserId % 10 , and then pointing to a respective topic like 0 ||
> 1 || 2 etc..
>
> Abhi
>
>
>
>
> --
> If you can't succeed, call it version 1.0
>



-- 
If you can't succeed, call it version 1.0


RE: Regarding kafka partition and replication

2016-07-19 Thread Tauzell, Dave
Having multiple brokers on the same node has a couple of problems for a 
production installation:

1. You'll have multiple brokers contending for disk and memory resources
2. You could have your partitions replicated to the same node which means if 
that node fails you would lose data.

I think you are better off having 3 nodes with 3 brokers.   You can keep with 9 
partitions in case you want to add physical nodes in the future and have a 
replication factor of 2 or 3.

-Dave

Dave Tauzell | Senior Software Engineer | Surescripts
O: 651.855.3042 | www.surescripts.com |   dave.tauz...@surescripts.com
Connect with us: Twitter I LinkedIn I Facebook I YouTube


-Original Message-
From: Amit K [mailto:amitk@gmail.com]
Sent: Monday, July 18, 2016 8:55 PM
To: users@kafka.apache.org
Subject: Regarding kafka partition and replication

Hi,

I have 3 nodes, each with 3 brokers, Kafka cluster along with 3 zookeeper 
cluster. So total 9 brokers spread across 3 different machines. I am adhered to 
Kafka 0.9.

In order to optimally use the infrastructure for 2 topics (as of now, is not 
expected to grow drastically in near future), I am thinking of having 9 
partitions with 3 (or 6?) replication factor. Will this help me having good 
distribution of partitions and replicas across brokers? System does not have 
hugh load (<50 requests/sec of less than 1 kb load each) as of now and is 
neither expected to get higher load than this.

If this replication and partition does not help, please suggest a better topic 
partition and replication strategy.

Also please guide me with any articles or documents about setting up multi node 
Kafka cluster in regard to partition, replication, general properties to be 
used (a kind of good practice etc.)

Thanks,
Amit
This e-mail and any files transmitted with it are confidential, may contain 
sensitive information, and are intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this e-mail in error, 
please notify the sender by reply e-mail immediately and destroy all copies of 
the e-mail and any attachments.


Re: Regarding Kafka Log compaction Features

2016-05-06 Thread Spico Florin
hi!
 please have a look at this article.  it help me touse the log compaction
feature mechanism

i hope thtat it helps.
regards,
florin

http://www.shayne.me/blog/2015/2015-06-25-everything-about-kafka-part-2/
On Thursday, May 5, 2016, Behera, Himansu (Contractor) <
himansu_beh...@cable.comcast.com> wrote:

> Hi Team,
>
>
>
> I am working on implementing the kafka log compaction feature in my
> project.
>
>
>
> Please find the server. Properties. I have made all the config changes
> needed/suggested in the kafka log compaction  forum.But was not to able to
> resolve  the issue.
>
>
>
> My use as follows:
>
>
>
> Step 1:.We send a keyed message(String,string) from one of the producer
>  to the topic.
>
> Step 2:Then we send around 10 million  keyed messages(with unique key) to
> the above said topic.
>
> Step 3:Then  we try to send update to  the key in step 1 with some other
> value other than  in step 1 after 1800 secs
>
>
>
> Expected Result:  The key should be updated with  the recent value.
>
> Actual Result: The updated key contains the old value.
>
>
>
>
>
> Appreciate if someone can help me in implementing the log compaction
>  features POC.
>
>
>
> Please find the server. properties attached for  your reference.
>
>
>
> Regards,
>
> Himansu
>
>
>


Re: regarding Kafka Queing system

2016-02-26 Thread Alexis Midon
You can fetch messages by offset.
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-FetchRequest


On Fri, Feb 26, 2016 at 7:23 AM rahul shukla 
wrote:

> Hello,
> I am working SNMP trap parsing project in my acadmic. i am using kafka
> message system in my project. Actully  i want to store trap object in
> kafka which is getting from snmp agent, and retrieve that object on
> another side for further processing.
> So, my query is , IS there any way to store a particular  Event in
> kafka and retrieve that event on second side for  further processing
> ?.
>
> Please assist me.
>
>
> Thanks & Regards
> Rahul Shukla
> +91-9826277980
>


Re: Regarding Kafka release 0.8.2-beta

2015-01-26 Thread Jason Rosenberg
shouldn't the new consumer api be removed from the 0.8.2 code base then?

On Fri, Jan 23, 2015 at 10:30 AM, Joe Stein joe.st...@stealth.ly wrote:

 The new consumer is scheduled for 0.9.0.

 Currently Kafka release candidate 2 for 0.8.2.0 is being voted on.

 There is an in progress patch to the new consumer that you can try out
 https://issues.apache.org/jira/browse/KAFKA-1760

 /***
  Joe Stein
  Founder, Principal Consultant
  Big Data Open Source Security LLC
  http://www.stealth.ly
  Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
 /

 On Fri, Jan 23, 2015 at 1:55 AM, Reeni Mathew reenimathew...@gmail.com
 wrote:

  Hi Team,
 
  I was playing around with your recent release 0.8.2-beta.
  Producer worked fine whereas new consumer did not.
 
  org.apache.kafka.clients.consumer.KafkaConsumer
 
  After digging the code I realized that the implementation for the same is
  not available. Only API is present.
  Could you please let me know by when we can expect the implementation of
  the same.
 
  Thanks  Regards
 
  Reeni
 



Re: Regarding Kafka release 0.8.2-beta

2015-01-26 Thread Jun Rao
The new consumer api is actually excluded from the javadoc that we generate.

Thanks,

Jun

On Mon, Jan 26, 2015 at 11:54 AM, Jason Rosenberg j...@squareup.com wrote:

 shouldn't the new consumer api be removed from the 0.8.2 code base then?

 On Fri, Jan 23, 2015 at 10:30 AM, Joe Stein joe.st...@stealth.ly wrote:

  The new consumer is scheduled for 0.9.0.
 
  Currently Kafka release candidate 2 for 0.8.2.0 is being voted on.
 
  There is an in progress patch to the new consumer that you can try out
  https://issues.apache.org/jira/browse/KAFKA-1760
 
  /***
   Joe Stein
   Founder, Principal Consultant
   Big Data Open Source Security LLC
   http://www.stealth.ly
   Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
  /
 
  On Fri, Jan 23, 2015 at 1:55 AM, Reeni Mathew reenimathew...@gmail.com
  wrote:
 
   Hi Team,
  
   I was playing around with your recent release 0.8.2-beta.
   Producer worked fine whereas new consumer did not.
  
   org.apache.kafka.clients.consumer.KafkaConsumer
  
   After digging the code I realized that the implementation for the same
 is
   not available. Only API is present.
   Could you please let me know by when we can expect the implementation
 of
   the same.
  
   Thanks  Regards
  
   Reeni
  
 



Re: Regarding Kafka release 0.8.2-beta

2015-01-26 Thread Joe Stein
Matve wr should add to the documentation experimental so folks that don't
know understand.

/***
Joe Stein
Founder, Principal Consultant
Big Data Open Source Security LLC
http://www.stealth.ly
Twitter: @allthingshadoop
/
On Jan 26, 2015 11:56 AM, Jason Rosenberg j...@squareup.com wrote:

 shouldn't the new consumer api be removed from the 0.8.2 code base then?

 On Fri, Jan 23, 2015 at 10:30 AM, Joe Stein joe.st...@stealth.ly wrote:

  The new consumer is scheduled for 0.9.0.
 
  Currently Kafka release candidate 2 for 0.8.2.0 is being voted on.
 
  There is an in progress patch to the new consumer that you can try out
  https://issues.apache.org/jira/browse/KAFKA-1760
 
  /***
   Joe Stein
   Founder, Principal Consultant
   Big Data Open Source Security LLC
   http://www.stealth.ly
   Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
  /
 
  On Fri, Jan 23, 2015 at 1:55 AM, Reeni Mathew reenimathew...@gmail.com
  wrote:
 
   Hi Team,
  
   I was playing around with your recent release 0.8.2-beta.
   Producer worked fine whereas new consumer did not.
  
   org.apache.kafka.clients.consumer.KafkaConsumer
  
   After digging the code I realized that the implementation for the same
 is
   not available. Only API is present.
   Could you please let me know by when we can expect the implementation
 of
   the same.
  
   Thanks  Regards
  
   Reeni