Unused options in ConsumerPerformance

2020-06-09 Thread Jiamei Xie
Hi
It seems  that Option numThreadsOpt and numFetchersOpt are unused in 
ConsumerPerformance. Could it be removed?


I have done benchmarks to get consumer performance vs threads and there were no 
big differences with different thread number.

I read source code core/src/main/scala/kafka/tools/ConsumerPerformance.scala 
and found that numThreadsOpt and numFetchersOpt are not used.


Best wishes
Jiamei Xie

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.


Re: Kafka Connect Connector Tasks Uneven Division

2020-06-09 Thread Deepak Raghav
Hi Robin

Thanks for your reply and accept my apology for the delayed response.

As you suggested that we should have a separate worker cluster based on
workload pattern. But as you said, task allocation is nondeterministic, so
same things can happen in the new cluster.

Please let me know if my understanding is correct or not.

Regards and Thanks
Deepak Raghav



On Tue, May 26, 2020 at 8:20 PM Robin Moffatt  wrote:

> The KIP for the current rebalancing protocol is probably a good reference:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-415:+Incremental+Cooperative+Rebalancing+in+Kafka+Connect
>
>
> --
>
> Robin Moffatt | Senior Developer Advocate | ro...@confluent.io | @rmoff
>
>
> On Tue, 26 May 2020 at 14:25, Deepak Raghav 
> wrote:
>
> > Hi Robin
> >
> > Thanks for the clarification.
> >
> > As you suggested, that task allocation between the workers is
> > nondeterministic. I have shared the same information within in my team
> but
> > there are some other parties, with whom I need to share this information
> as
> > explanation for the issue raised by them and I cannot show this mail as a
> > reference.
> >
> > It would be very great if you please share any link/discussion reference
> > regarding the same.
> >
> > Regards and Thanks
> > Deepak Raghav
> >
> >
> >
> > On Thu, May 21, 2020 at 2:12 PM Robin Moffatt 
> wrote:
> >
> > > I don't think you're right to assert that this is "expected behaviour":
> > >
> > > >  the tasks are divided in below pattern when they are first time
> > > registered
> > >
> > > Kafka Connect task allocation is non-determanistic.
> > >
> > > I'm still not clear if you're solving for a theoretical problem or an
> > > actual one. If this is an actual problem that you're encountering and
> > need
> > > a solution to then since the task allocation is not deterministic it
> > sounds
> > > like you need to deploy separate worker clusters based on the workload
> > > patterns that you are seeing and machine resources available.
> > >
> > >
> > > --
> > >
> > > Robin Moffatt | Senior Developer Advocate | ro...@confluent.io |
> @rmoff
> > >
> > >
> > > On Wed, 20 May 2020 at 21:29, Deepak Raghav 
> > > wrote:
> > >
> > > > Hi Robin
> > > >
> > > > I had gone though the link you provided, It is not helpful in my
> case.
> > > > Apart from this, *I am not getting why the tasks are divided in
> *below
> > > > pattern* when they are *first time registered*, which is expected
> > > behavior.
> > > > I*s there any parameter which we can pass in worker property file
> which
> > > > handle the task assignment strategy like we have range assigner or
> > round
> > > > robin in consumer-group ?
> > > >
> > > > connector rest status api result after first registration :
> > > >
> > > > {
> > > >   "name": "REGION_CODE_UPPER-Cdb_Dchchargeableevent",
> > > >   "connector": {
> > > > "state": "RUNNING",
> > > > "worker_id": "10.0.0.5:*8080*"
> > > >   },
> > > >   "tasks": [
> > > > {
> > > >   "id": 0,
> > > >   "state": "RUNNING",
> > > >   "worker_id": "10.0.0.4:*8078*"
> > > > },
> > > > {
> > > >   "id": 1,
> > > >   "state": "RUNNING",
> > > >   "worker_id": "10.0.0.5:*8080*"
> > > > }
> > > >   ],
> > > >   "type": "sink"
> > > > }
> > > >
> > > > and
> > > >
> > > > {
> > > >   "name": "REGION_CODE_UPPER-Cdb_Neatransaction",
> > > >   "connector": {
> > > > "state": "RUNNING",
> > > > "worker_id": "10.0.0.4:*8078*"
> > > >   },
> > > >   "tasks": [
> > > > {
> > > >   "id": 0,
> > > >   "state": "RUNNING",
> > > >   "worker_id": "10.0.0.4:*8078*"
> > > > },
> > > > {
> > > >   "id": 1,
> > > >   "state": "RUNNING",
> > > >   "worker_id": "10.0.0.5:*8080*"
> > > > }
> > > >   ],
> > > >   "type": "sink"
> > > > }
> > > >
> > > >
> > > > But when I stop the second worker process and wait for
> > > > scheduled.rebalance.max.delay.ms time i.e 5 min to over, and start
> the
> > > > process again. Result is different.
> > > >
> > > > {
> > > >   "name": "REGION_CODE_UPPER-Cdb_Dchchargeableevent",
> > > >   "connector": {
> > > > "state": "RUNNING",
> > > > "worker_id": "10.0.0.5:*8080*"
> > > >   },
> > > >   "tasks": [
> > > > {
> > > >   "id": 0,
> > > >   "state": "RUNNING",
> > > >   "worker_id": "10.0.0.5:*8080*"
> > > > },
> > > > {
> > > >   "id": 1,
> > > >   "state": "RUNNING",
> > > >   "worker_id": "10.0.0.5:*8080*"
> > > > }
> > > >   ],
> > > >   "type": "sink"
> > > > }
> > > >
> > > > and
> > > >
> > > > {
> > > >   "name": "REGION_CODE_UPPER-Cdb_Neatransaction",
> > > >   "connector": {
> > > > "state": "RUNNING",
> > > > "worker_id": "10.0.0.4:*8078*"
> > > >   },
> > > >   "tasks": [
> > > > {
> > > >   "id": 0,
> > > >   "state": "RUNNING",
> > > >   "worker_id": "10.0.0.4:*8078*"
> > > > },
> > > > {
> > > >   "id": 1,
> > > >   "state": "RUNNING",
> > > >   "worker_id": "10.0.0

Re: Unused options in ConsumerPerformance

2020-06-09 Thread Ricardo Ferreira

Hi Jiamei,

Changes in Apache Kafka need to be handled via JIRA tickets 
. The best way to 
get started with this is joining the developer mailing list 
.


Thanks,

-- Ricardo

On 6/9/20 4:00 AM, Jiamei Xie wrote:

Hi
It seems  that Option numThreadsOpt and numFetchersOpt are unused in 
ConsumerPerformance. Could it be removed?


I have done benchmarks to get consumer performance vs threads and there were no 
big differences with different thread number.

I read source code core/src/main/scala/kafka/tools/ConsumerPerformance.scala 
and found that numThreadsOpt and numFetchersOpt are not used.


Best wishes
Jiamei Xie

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.



Protocol evolution/versioning docs are missing

2020-06-09 Thread Haruki Okada
Hi, Kafka.

While reading through protocol docs, I found that doc about protocol
evolution and versioning are missing in protocol.html, while toc contains a
section for it.

https://github.com/apache/kafka/blob/trunk/docs/protocol.html#L47-L50

Is there any plan to add a doc about protocol evolution?


Thanks,

-- 

Okada Haruki
ocadar...@gmail.com



回复: Unused options in ConsumerPerformance

2020-06-09 Thread Jiamei Xie
Hi Ricardo,
  I was a little confused about the function of Jira tickets and mailing list. 
Thanks for your clarify. I have created one, 
https://issues.apache.org/jira/browse/KAFKA-10126

发件人: Ricardo Ferreira 
发送时间: 2020年6月9日 19:58
收件人: users@kafka.apache.org; Jiamei Xie 
主题: Re: Unused options in ConsumerPerformance


Hi Jiamei,

Changes in Apache Kafka need to be handled via JIRA 
tickets. The best way to 
get started with this is joining the developer mailing 
list.

Thanks,

-- Ricardo
On 6/9/20 4:00 AM, Jiamei Xie wrote:

Hi

It seems  that Option numThreadsOpt and numFetchersOpt are unused in 
ConsumerPerformance. Could it be removed?





I have done benchmarks to get consumer performance vs threads and there were no 
big differences with different thread number.



I read source code core/src/main/scala/kafka/tools/ConsumerPerformance.scala 
and found that numThreadsOpt and numFetchersOpt are not used.





Best wishes

Jiamei Xie



IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.



IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.


Re: Kafka Connect Connector Tasks Uneven Division

2020-06-09 Thread Deepak Raghav
Hi  Robin

Can you please reply.

I just want to add one more thing, that yesterday I tried with
connect.protocal=eager. Task distribution was balanced after that.

Regards and Thanks
Deepak Raghav



On Tue, Jun 9, 2020 at 2:37 PM Deepak Raghav 
wrote:

> Hi Robin
>
> Thanks for your reply and accept my apology for the delayed response.
>
> As you suggested that we should have a separate worker cluster based on
> workload pattern. But as you said, task allocation is nondeterministic, so
> same things can happen in the new cluster.
>
> Please let me know if my understanding is correct or not.
>
> Regards and Thanks
> Deepak Raghav
>
>
>
> On Tue, May 26, 2020 at 8:20 PM Robin Moffatt  wrote:
>
>> The KIP for the current rebalancing protocol is probably a good reference:
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-415:+Incremental+Cooperative+Rebalancing+in+Kafka+Connect
>>
>>
>> --
>>
>> Robin Moffatt | Senior Developer Advocate | ro...@confluent.io | @rmoff
>>
>>
>> On Tue, 26 May 2020 at 14:25, Deepak Raghav 
>> wrote:
>>
>> > Hi Robin
>> >
>> > Thanks for the clarification.
>> >
>> > As you suggested, that task allocation between the workers is
>> > nondeterministic. I have shared the same information within in my team
>> but
>> > there are some other parties, with whom I need to share this
>> information as
>> > explanation for the issue raised by them and I cannot show this mail as
>> a
>> > reference.
>> >
>> > It would be very great if you please share any link/discussion reference
>> > regarding the same.
>> >
>> > Regards and Thanks
>> > Deepak Raghav
>> >
>> >
>> >
>> > On Thu, May 21, 2020 at 2:12 PM Robin Moffatt 
>> wrote:
>> >
>> > > I don't think you're right to assert that this is "expected
>> behaviour":
>> > >
>> > > >  the tasks are divided in below pattern when they are first time
>> > > registered
>> > >
>> > > Kafka Connect task allocation is non-determanistic.
>> > >
>> > > I'm still not clear if you're solving for a theoretical problem or an
>> > > actual one. If this is an actual problem that you're encountering and
>> > need
>> > > a solution to then since the task allocation is not deterministic it
>> > sounds
>> > > like you need to deploy separate worker clusters based on the workload
>> > > patterns that you are seeing and machine resources available.
>> > >
>> > >
>> > > --
>> > >
>> > > Robin Moffatt | Senior Developer Advocate | ro...@confluent.io |
>> @rmoff
>> > >
>> > >
>> > > On Wed, 20 May 2020 at 21:29, Deepak Raghav > >
>> > > wrote:
>> > >
>> > > > Hi Robin
>> > > >
>> > > > I had gone though the link you provided, It is not helpful in my
>> case.
>> > > > Apart from this, *I am not getting why the tasks are divided in
>> *below
>> > > > pattern* when they are *first time registered*, which is expected
>> > > behavior.
>> > > > I*s there any parameter which we can pass in worker property file
>> which
>> > > > handle the task assignment strategy like we have range assigner or
>> > round
>> > > > robin in consumer-group ?
>> > > >
>> > > > connector rest status api result after first registration :
>> > > >
>> > > > {
>> > > >   "name": "REGION_CODE_UPPER-Cdb_Dchchargeableevent",
>> > > >   "connector": {
>> > > > "state": "RUNNING",
>> > > > "worker_id": "10.0.0.5:*8080*"
>> > > >   },
>> > > >   "tasks": [
>> > > > {
>> > > >   "id": 0,
>> > > >   "state": "RUNNING",
>> > > >   "worker_id": "10.0.0.4:*8078*"
>> > > > },
>> > > > {
>> > > >   "id": 1,
>> > > >   "state": "RUNNING",
>> > > >   "worker_id": "10.0.0.5:*8080*"
>> > > > }
>> > > >   ],
>> > > >   "type": "sink"
>> > > > }
>> > > >
>> > > > and
>> > > >
>> > > > {
>> > > >   "name": "REGION_CODE_UPPER-Cdb_Neatransaction",
>> > > >   "connector": {
>> > > > "state": "RUNNING",
>> > > > "worker_id": "10.0.0.4:*8078*"
>> > > >   },
>> > > >   "tasks": [
>> > > > {
>> > > >   "id": 0,
>> > > >   "state": "RUNNING",
>> > > >   "worker_id": "10.0.0.4:*8078*"
>> > > > },
>> > > > {
>> > > >   "id": 1,
>> > > >   "state": "RUNNING",
>> > > >   "worker_id": "10.0.0.5:*8080*"
>> > > > }
>> > > >   ],
>> > > >   "type": "sink"
>> > > > }
>> > > >
>> > > >
>> > > > But when I stop the second worker process and wait for
>> > > > scheduled.rebalance.max.delay.ms time i.e 5 min to over, and start
>> the
>> > > > process again. Result is different.
>> > > >
>> > > > {
>> > > >   "name": "REGION_CODE_UPPER-Cdb_Dchchargeableevent",
>> > > >   "connector": {
>> > > > "state": "RUNNING",
>> > > > "worker_id": "10.0.0.5:*8080*"
>> > > >   },
>> > > >   "tasks": [
>> > > > {
>> > > >   "id": 0,
>> > > >   "state": "RUNNING",
>> > > >   "worker_id": "10.0.0.5:*8080*"
>> > > > },
>> > > > {
>> > > >   "id": 1,
>> > > >   "state": "RUNNING",
>> > > >   "worker_id": "10.0.0.5:*8080*"
>> > > > }
>> > > >   ],
>> > > >   "type": "sink"
>> > > > }
>> > > >
>> > > > and
>> > >