[ https://issues.apache.org/jira/browse/KAFKA-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922366#comment-16922366 ]
dingsainan commented on KAFKA-8738: ----------------------------------- Sorry for my late reply. Below are the detail of this case. h1. 三、执行过程 h2. 3.1执行的命令 {panel} {panel} |{{.}}{{/kafka-reassign-partitions}}{{.sh --zookeeper zkAddress --bootstrap-server bootstrap --reassignment-json-}}{{file}} {{/mnt/storage00/Nora/reassignment211}}{{.json --execute}}| h2. 3.2第一次迁移内容 {panel} {panel} |{{{}}{{"partitions"}}{{:}} {{ }}{{[{}}{{"topic"}}{{: }}{{"lancer_ops_billions_all_log_json_billions"}}{{,}} {{ }}{{"partition"}}{{: 1,}} {{ }}{{"replicas"}}{{: [6,15],}} {{ }}{{"log_dirs"}}{{: [}}{{"any"}}{{,}}{{"/data/mnt/storage02/datum/kafka_data"}}{{]}]}} {{}}}| h2. 3.3第二次迁移内容 {panel} {panel} |{{{}}{{"partitions"}}{{:}} {{ }}{{[{}}{{"topic"}}{{: }}{{"lancer_ops_billions_all_log_json_billions"}}{{,}} {{ }}{{"partition"}}{{: 1,}} {{ }}{{"replicas"}}{{: [6,15],}} {{ }}{{"log_dirs"}}{{: [}}{{"any"}}{{,}}{{"/data/mnt/storage03/datum/kafka_data"}}{{]}]}} {{}}} {{Map(lancer_ops_billions_all_log_json_billions-1 -> LogCleaningPaused(1))}}| h2. 3.4结果 lancer_ops_billions_all_log_json_billions-1这个分区的日志保留时间不再起作用 该TP的状态改变情况 从 None --->Map(lancer_ops_billions_all_log_json_billions-1 -> LogCleaningPaused(1)) --->Map(lancer_ops_billions_all_log_json_billions-1 -> LogCleaningPaused(2)) --->Map(lancer_ops_billions_all_log_json_billions-1 -> LogCleaningPaused(1)) 最后保持在 LogCleaningPaused(1)状态。 h2. 3.5导致问题出现的代码块 {panel} {panel} |{{//}}{{只有当不是future文件的时候才会执行resume操作}} {{if}} {{(cleaner != null && !isFuture) {}} {{ }}{{trace(s}}{{"the cleaner is not null and the isfure is false "}}{{)}} {{ }}{{cleaner.abortCleaning(topicPartition)}} {{ }}{{cleaner.updateCheckpoints(removedLog.}}{{dir}}{{.getParentFile)}} {{}}}| h2. 3.6fix方式 {panel} {panel} |{{//}}{{针对future文件也需要进行tp的resume clean的操作}}| [^migrationCase.pdf] > Cleaning thread blocked when more than one ALTER_REPLICA_LOG_DIRS requests > sent > -------------------------------------------------------------------------------- > > Key: KAFKA-8738 > URL: https://issues.apache.org/jira/browse/KAFKA-8738 > Project: Kafka > Issue Type: Bug > Affects Versions: 2.1.1 > Reporter: dingsainan > Priority: Major > Attachments: migrationCase.pdf > > > Hi, > > I am experiencing one situation that the log cleaner dose not work for the > related topic-partition when using --kafka-reassign-partitions.sh tool for > V2.1.1 for more than one time frequently. > > My operation: > submitting one task for migration replica in one same broker first, when > the previous task still in progress, we submit one new task for the same > topic-partition. > > {code:java} > // the first task: > {"partitions": > [{"topic": "lancer_ops_billions_all_log_json_billions", > "partition": 1, > "replicas": [6,15], > "log_dirs": ["any","/data/mnt/storage02/datum/kafka_data"]}] > } > //the second task > {"partitions": > [{"topic": "lancer_ops_billions_all_log_json_billions", > "partition": 1, > "replicas": [6,15], > "log_dirs": ["any","/data/mnt/storage03/datum/kafka_data"]}] > } > > {code} > > My search: > Kafka executes abortAndPauseCleaning() once task is submitted, shortly, > another task is submitted for the same topic-partition, so the clean thread > status is {color:#ff0000}LogCleaningPaused(2){color} currently. When the > second task completed, the clean thread will be resumed for this > topic-partition once. In my case, the previous task is killed directly, no > resumeClean() is executed for the first task, so when the second task is > completed, the clean status for the topic-partition is still > {color:#ff0000}LogCleaningPaused(1){color}, which blocks the clean thread for > the topic-partition. > > _That's all my search, please confirm._ > > _Thanks_ > _Nora_ -- This message was sent by Atlassian Jira (v8.3.2#803003)