Request to join Assignee list

2020-08-31 Thread Yingshuan Song
Hi,

Could you please add me to the JIRA Assignee list. I would like to start
contributing.

Jira username : ymxz
Full Name: songyingshuan

Apologies in case I've sent this request to the wrong mailing list.

Thanks,
Yingshuan Song


Re: Kafka cluster cannot connect to zookeeper

2020-08-31 Thread Yingshuan Song
maybe u can list the modified configurations, also the version of zk and
kafka?

Li,Dingqun  于2020年8月28日周五 下午3:41写道:

> We have one zookeeper node and two Kafka nodes. After that, we expand the
> capacity of zookeeper: change the configuration of zookeeper node, restart
> it, and add two zookeeper nodes. After that, my Kafka cluster could not
> connect to the zookeeper cluster, and there was no information available in
> the log.
>
> What should we do? Thank you
>


Re: will partition reassignment cause same message being processed in parallel?

2020-08-13 Thread Yingshuan Song
Yes,it is possible.

Think of this :
  1 - Consumer A and consumer B ( A and B with the same consumer group.id
and enable auto commit) consume a topic T with 2 partitions,and the
assignment is (A  => T-0 , B => T-1)
  2 - The application runs correctly when the records are sent to kafka at
a low speed. let's assume :
2-1) each consumer can process 5 records per second
2-2) the speed of records sent to topic T is 5/partition/second
   => so that each time the consumer can poll 5 records and process
correctly in one second.
  3 - The speed of records sent to partiton T-0 increases to 500/second,so
consumer A can poll 500 records next time and 500/5 = 100s needed to
process.
  4 - If the config - 'max.poll.interval.ms' of consumer is less than
100s,kafka broker will think consumer A is dead 100 seconds later even if A
is working hard and a rebalance will be invoked.
  5 - Consumer B will take over both the two partitions in the next round
of balance and reprocess the records in T-0.
  6 - Step 4 && 5 will keep going and never recover automatically.

Hope this helps

Jin, Yun  于2020年8月12日周三 下午5:18写道:

> Hi,
>
> Partition may be reassigned if the consumer is not reachable. Will
> partition reassignment cause same message being processed in parallel?
> Suppose if Kafka found consumer A is not reachable (maybe because of
> network problem), it assigns the partition to consumer B. But actually
> consumer A is still running, this may cause the situation that a same
> message is processed in parallel by consumer A and consumer B. If the logic
> in processing the message includes updating some database tables, we have
> to lock some rows first. If such parallel processing the same message can
> not happen, we needn't use lock.
> Appreciate if you can provide some information on this.
>
> Best regards,
> Yun
>


Re: Kafka topic partition distributing evenly on disks

2020-08-07 Thread Yingshuan Song
Hi Peter,
Agreed with Manoj and Vinicius, i think those rules led to this result :

1)the partitions of a topic - N and replication number - R determine the
real partition-replica count of this topic, which is N * R;
2)   kafka can distribute partitions evenly among brokers, but it is based
on the broker count when the topic was created, this is important.
If we create a topic (N - 4, R - 3) in a kafka cluster which contains 3
kafka brokers, then 4 * 3 / 3 = 4 partitions will be assigned to each
broker.
But if a new broker was added into this cluster and another topic (N - 4, R
- 3) need to be created, then 4 * 3 / 4 = 3 partitions will be assigned to
each broker.
Kafka will not assign all those partitions to the new added broker even
though it is idle and i think this is a shortcoming of kafka.
This rule also applies to disk-level, which means that when a set of
partitions assigned to a specific broker, each of the disks will get the
same number of partitions without considering the load of disks at that
time.
3) when producer send records to topics, how to chose partiton : 3-1) if a
record has a key, then the partition number calculate according to the key;
3-2) if  records have no keys, then those records will be sent to each
partition in turns. So, if there are lots of records with the same key, and
those records will be sent to the same partition, and may take up a lot of
disk space.


hope this helps

Vinicius Scheidegger  于2020年8月7日周五 上午6:10写道:

> Hi Peter,
>
> AFAIK, everything depends on:
>
> 1) How you have configured your topic
>   a) number of partitions (here I understand you have 15 partitions)
>   b) partition replication configuration (each partition necessarily has a
> leader - primary responsible to hold the data - and for reads and writes)
> you can configure the topic to have a number of replicas
> 2) How you publish messages to the topic
>   a) The publisher is responsible to choose the partition. This can be done
> consciously (by setting the partition id while sending the message to the
> topic) or unconsciously (by using the DefaultPartitioner or any other
> partitioner scheme).
>
> All messages sent to a specific partition will be written first to the
> leader (meaning that the disk configured for the partition leader will
> receive the load) and then replicated to the replica (followers).
> Kafka does not automatically distribute the data equally to the different
> brokers - you need to think about your architecture having that in mind.
>
> I hope it helps
>
> On Thu, Aug 6, 2020 at 10:23 PM Péter Nagykátai 
> wrote:
>
> > I initially started with one data disk (mounted solely to hold Kafka
> data)
> > and recently added a new one.
> >
> > On Thu, Aug 6, 2020 at 10:13 PM  wrote:
> >
> > > What do you mean older disk ?
> > >
> > > On 8/6/20, 12:05 PM, "Péter Nagykátai"  wrote:
> > >
> > > [External]
> > >
> > >
> > > Yeah, but it doesn't do that. My "older" disks have ~70 partitions,
> > the
> > > newer ones ~5 partitions. That's why I'm asking what went wrong.
> > >
> > > On Thu, Aug 6, 2020 at 8:35 PM 
> wrote:
> > >
> > > > Kafka  evenly distributed number of partition on each disk so in
> > > your case
> > > > every disk should have 3/2 topic partitions .
> > > > It is producer job to evenly produce data by partition key  to
> > topic
> > > > partition .
> > > > How it partition key , it is auto generated or producer sending
> key
> > > along
> > > > with message .
> > > >
> > > >
> > > > On 8/6/20, 7:29 AM, "Péter Nagykátai" 
> > wrote:
> > > >
> > > > [External]
> > > >
> > > >
> > > > Hello,
> > > >
> > > > I have a Kafka cluster with 3 brokers (v2.3.0) and each
> broker
> > > has 2
> > > > disks
> > > > attached. I added a new topic (heavyweight) and was surprised
> > > that
> > > > even if
> > > > the topic has 15 partitions, those weren't distributed evenly
> > on
> > > the
> > > > disks.
> > > > Thus I got one disk that's almost empty and the other almost
> > > filled
> > > > up. Is
> > > > there any way to have Kafka evenly distribute data on its
> > disks?
> > > >
> > > > Thank you!
> > > >
> > > >
> > > > This e-mail and any files transmitted with it are for the sole
> use
> > > of the
> > > > intended recipient(s) and may contain confidential and privileged
> > > > information. If you are not the intended recipient(s), please
> reply
> > > to the
> > > > sender and destroy all copies of the original message. Any
> > > unauthorized
> > > > review, use, disclosure, dissemination, forwarding, printing or
> > > copying of
> > > > this email, and/or any action taken in reliance on the contents
> of
> > > this
> > > > e-mail is strictly prohibited and may be unlawful. Where
> permitted
> > by
> > > > applicable law, this e-mail and other e-mail communications sent
> to
> > > and
> >