[ https://issues.apache.org/jira/browse/KAFKA-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16514678#comment-16514678 ]
Ismael Juma commented on KAFKA-5857: ------------------------------------ Has anyone tried this with recent versions? This was improved significantly in 1.1.0. > Excessive heap usage on controller node during reassignment > ----------------------------------------------------------- > > Key: KAFKA-5857 > URL: https://issues.apache.org/jira/browse/KAFKA-5857 > Project: Kafka > Issue Type: Bug > Components: controller > Affects Versions: 0.11.0.0 > Environment: CentOs 7, Java 1.8 > Reporter: Raoufeh Hashemian > Priority: Major > Labels: reliability > Fix For: 2.1.0 > > Attachments: CPU.png, disk_write_x.png, memory.png, > reassignment_plan.txt > > > I was trying to expand our kafka cluster of 6 broker nodes to 12 broker > nodes. > Before expansion, we had a single topic with 960 partitions and a replication > factor of 3. So each node had 480 partitions. The size of data in each node > was 3TB . > To do the expansion, I submitted a partition reassignment plan (see attached > file for the current/new assignments). The plan was optimized to minimize > data movement and be rack aware. > When I submitted the plan, it took approximately 3 hours for moving data from > old to new nodes to complete. After that, it started deleting source > partitions (I say this based on the number of file descriptors) and > rebalancing leaders which has not been successful. Meanwhile, the heap usage > in the controller node started to go up with a large slope (along with long > GC times) and it took 5 hours for the controller to go out of memory and > another controller started to have the same behaviour for another 4 hours. At > this time the zookeeper ran out of disk and the service stopped. > To recover from this condition: > 1) Removed zk logs to free up disk and restarted all 3 zk nodes > 2) Deleted /kafka/admin/reassign_partitions node from zk > 3) Had to do unclean restarts of kafka service on oom controller nodes which > took 3 hours to complete . After this stage there was still 676 under > replicated partitions. > 4) Do a clean restart on all 12 broker nodes. > After step 4 , number of under replicated nodes went to 0. > So I was wondering if this memory footprint from controller is expected for > 1k partitions ? Did we do sth wrong or it is a bug? > Attached are some resource usage graph during this 30 hours event and the > reassignment plan. I'll try to add log files as well -- This message was sent by Atlassian JIRA (v7.6.3#76005)