[jira] [Commented] (KAFKA-8738) Cleaning thread blocked when more than one ALTER_REPLICA_LOG_DIRS requests sent

dingsainan (Jira) Wed, 04 Sep 2019 03:31:19 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922366#comment-16922366
 ]


dingsainan commented on KAFKA-8738:
-----------------------------------

Sorry for my late reply.

Below are the detail of this case.

 
h1. 三、执行过程
h2. 3.1执行的命令
{panel}
{panel}
|{{.}}{{/kafka-reassign-partitions}}{{.sh --zookeeper zkAddress 
--bootstrap-server bootstrap --reassignment-json-}}{{file}} 
{{/mnt/storage00/Nora/reassignment211}}{{.json --execute}}|
h2. 3.2第一次迁移内容
{panel}
{panel}
|{{{}}{{"partitions"}}{{:}}
{{            }}{{[{}}{{"topic"}}{{: 
}}{{"lancer_ops_billions_all_log_json_billions"}}{{,}}
{{              }}{{"partition"}}{{: 1,}}
{{              }}{{"replicas"}}{{: [6,15],}}
{{              }}{{"log_dirs"}}{{: 
[}}{{"any"}}{{,}}{{"/data/mnt/storage02/datum/kafka_data"}}{{]}]}}
{{}}}|

 
h2. 3.3第二次迁移内容
{panel}
{panel}
|{{{}}{{"partitions"}}{{:}}
{{            }}{{[{}}{{"topic"}}{{: 
}}{{"lancer_ops_billions_all_log_json_billions"}}{{,}}
{{              }}{{"partition"}}{{: 1,}}
{{              }}{{"replicas"}}{{: [6,15],}}
{{              }}{{"log_dirs"}}{{: 
[}}{{"any"}}{{,}}{{"/data/mnt/storage03/datum/kafka_data"}}{{]}]}}
{{}}}
 
 
{{Map(lancer_ops_billions_all_log_json_billions-1 -> LogCleaningPaused(1))}}|
h2. 3.4结果
lancer_ops_billions_all_log_json_billions-1这个分区的日志保留时间不再起作用
该TP的状态改变情况

从

None

--->Map(lancer_ops_billions_all_log_json_billions-1 -> LogCleaningPaused(1))

--->Map(lancer_ops_billions_all_log_json_billions-1 -> LogCleaningPaused(2))

--->Map(lancer_ops_billions_all_log_json_billions-1 -> LogCleaningPaused(1))

最后保持在 LogCleaningPaused(1)状态。

 
h2. 3.5导致问题出现的代码块
{panel}
{panel}
|{{//}}{{只有当不是future文件的时候才会执行resume操作}}
{{if}} {{(cleaner != null && !isFuture) {}}
{{  }}{{trace(s}}{{"the cleaner is not null and the isfure is false "}}{{)}}
 
{{  }}{{cleaner.abortCleaning(topicPartition)}}
{{  }}{{cleaner.updateCheckpoints(removedLog.}}{{dir}}{{.getParentFile)}}
{{}}}|
h2. 3.6fix方式
{panel}
{panel}
|{{//}}{{针对future文件也需要进行tp的resume clean的操作}}|
 
[^migrationCase.pdf]

> Cleaning thread blocked  when more than one ALTER_REPLICA_LOG_DIRS requests 
> sent
> --------------------------------------------------------------------------------
>
>                 Key: KAFKA-8738
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8738
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.1.1
>            Reporter: dingsainan
>            Priority: Major
>         Attachments: migrationCase.pdf
>
>
> Hi,
>   
>  I am experiencing one situation  that the log cleaner dose not work  for the 
> related topic-partition when using --kafka-reassign-partitions.sh tool for 
> V2.1.1 for more than one time frequently.
>   
>  My operation:
>  submitting one task for migration replica in one same broker first,  when 
> the previous task still in progress, we submit one new task for the same 
> topic-partition.
>  
> {code:java}
> // the first task:
> {"partitions":
>             [{"topic": "lancer_ops_billions_all_log_json_billions",
>               "partition": 1,
>               "replicas": [6,15],
>               "log_dirs": ["any","/data/mnt/storage02/datum/kafka_data"]}]
> }
> //the second task
> {"partitions":
>             [{"topic": "lancer_ops_billions_all_log_json_billions",
>               "partition": 1,
>               "replicas": [6,15],
>               "log_dirs": ["any","/data/mnt/storage03/datum/kafka_data"]}]
> }
>  
> {code}
>  
>  My search:
>  Kafka executes abortAndPauseCleaning() once task is submitted, shortly, 
> another task is submitted for the same topic-partition, so the clean thread 
> status is {color:#ff0000}LogCleaningPaused(2){color} currently. When the 
> second task completed, the clean thread will be resumed for this 
> topic-partition once. In my case, the previous task is killed directly, no 
> resumeClean() is executed for the first task, so when the second task is 
> completed, the clean status for the topic-partition is still 
> {color:#ff0000}LogCleaningPaused(1){color}, which blocks the clean thread for 
> the topic-partition.
>   
>  _That's all my search, please confirm._
>   
>  _Thanks_
>  _Nora_



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (KAFKA-8738) Cleaning thread blocked when more than one ALTER_REPLICA_LOG_DIRS requests sent

Reply via email to