[jira] [Comment Edited] (HDDS-13645) [DiskBalancer] Configuration Changes Not Reflected in DiskBalancer Status Report After Update on Specific DataNode

Gargi Jaiswal (Jira) Sun, 14 Sep 2025 23:43:40 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-13645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18020262#comment-18020262
 ]


Gargi Jaiswal edited comment on HDDS-13645 at 9/15/25 6:42 AM:
---------------------------------------------------------------

For the above problem . After working on this issue HDDS-13665 . The problem is 
resolved because current code has two report methods so *addDiskBalancerReport* 
is adding report instantly because of which update configurations is not 
honoured and start diskBalancer command is send. I checked on with the scm logs 
as well.
This is the exact command and scenario causing issue :
{code:java}
bash-5.1$ ozone admin datanode diskbalancer update -t 0.0002 -b 320 -p 10 -d 
ozone-ha-datanode-5 Update DiskBalancer Configuration on datanode(s): 
ozone-ha-datanode-5 bash-5.1$ ozone admin datanode diskbalancer start -d 
ozone-ha-datanode-5 Start DiskBalancer on datanode(s): ozone-ha-datanode-5{code}
{code:java}
2025-09-15 11:42:17 2025-09-15 06:12:17,121 [IPC Server handler 23 on default 
port 9860] INFO node.DiskBalancerManager: Sending diskBalancerCommand: 
opType=UPDATE, configuration=Disk Balancer Configuration values: 2025-09-15 
11:42:17 Key Value 2025-09-15 11:42:17 Threshold 2.0E-4 2025-09-15 11:42:17 Max 
disk bandwidth 320 2025-09-15 11:42:17 Parallel Thread 10 2025-09-15 11:42:17 
Stop After Disk Even true 2025-09-15 11:42:17 to Datanode 
2e88e335-e13d-4897-9e0e-e58559e4fcb3(ozone-ha-datanode-5.ozone-ha_default/172.18.0.15)
 2025-09-15 11:42:17 2025-09-15 06:12:17,269 
[scm1-EventQueue-PipelineReportForPipelineReportHandler] INFO 
pipeline.PipelineReportHandler: Opened pipeline 
Pipeline-5f663f25-79c4-4ab1-907e-3b45d51658a3 2025-09-15 11:42:22 2025-09-15 
06:12:22,538 [IPC Server handler 13 on default port 9860] INFO 
node.DiskBalancerManager: Sending diskBalancerCommand: opType=START, 
configuration=Disk Balancer Configuration values: 2025-09-15 11:42:22 Key Value 
2025-09-15 11:42:22 Threshold 10.0 2025-09-15 11:42:22 Max disk bandwidth 10 
2025-09-15 11:42:22 Parallel Thread 5 2025-09-15 11:42:22 Stop After Disk Even 
true 2025-09-15 11:42:22 to Datanode 
2e88e335-e13d-4897-9e0e-e58559e4fcb3(ozone-ha-datanode-5.ozone-ha_default/172.18.0.15){code}
Above is the case when *addDiskBalancerReport* was adding report to 
heartbeat.Now after removing and using {*}DiskBalancerReportPublisher{*}, 
update and start command try to change same configurations then each command is 
honoured.
Since *DiskBalancerReportPublisher* is periodic so "{_}This delay creates a 
time window where the pending update command can finish applying the new 
configuration. By the time the next report is generated, it will likely reflect 
the correct, updated state, effectively masking the temporary inconsistency 
from the SCM.{_}"
This is supported by this scm logs after changes:
{code:java}
2025-09-15 11:59:21 2025-09-15 06:29:21,558 
[7756dbe9-9594-48a7-8281-cdd50db75a26-server-thread1] INFO 
server.RaftServerConfigKeys: raft.server.snapshot.creation.gap = 1024 (default) 
2025-09-15 11:59:28 2025-09-15 06:29:28,792 [IPC Server handler 0 on default 
port 9860] INFO node.DiskBalancerManager: Sending diskBalancerCommand: 
opType=UPDATE, configuration=Disk Balancer Configuration values: 2025-09-15 
11:59:28 Key Value 2025-09-15 11:59:28 Threshold 2.0E-4 2025-09-15 11:59:28 Max 
disk bandwidth 320 2025-09-15 11:59:28 Parallel Thread 6 2025-09-15 11:59:28 
Stop After Disk Even true 2025-09-15 11:59:28 to Datanode 
bdf51d24-1db5-4089-a90d-846ccac7d54d(ozone-ha-datanode-5.ozone-ha_default/172.18.0.13)
 2025-09-15 11:59:35 2025-09-15 06:29:35,037 [IPC Server handler 2 on default 
port 9860] INFO node.DiskBalancerManager: Sending diskBalancerCommand: 
opType=START, configuration=Disk Balancer Configuration values: 2025-09-15 
11:59:35 Key Value 2025-09-15 11:59:35 Threshold 2.0E-4 2025-09-15 11:59:35 Max 
disk bandwidth 320 2025-09-15 11:59:35 Parallel Thread 6 2025-09-15 11:59:35 
Stop After Disk Even true 2025-09-15 11:59:35 to Datanode 
bdf51d24-1db5-4089-a90d-846ccac7d54d(ozone-ha-datanode-5.ozone-ha_default/172.18.0.13){code}
After these changes update command is honoured correctly.


was (Author: JIRAUSER308350):
For the above problem we discussed. After working on this issue HDDS-13665 . 
The problem is resolved because current code has two report methods so 
*addDiskBalancerReport* is adding report instantly because of which update 
configurations is not honoured and start diskBalancer command is send. I 
checked on with the scm logs as well.
This is the exact command and scenario causing issue :
{code:java}
bash-5.1$ ozone admin datanode diskbalancer update -t 0.0002 -b 320 -p 10 -d 
ozone-ha-datanode-5 Update DiskBalancer Configuration on datanode(s): 
ozone-ha-datanode-5 bash-5.1$ ozone admin datanode diskbalancer start -d 
ozone-ha-datanode-5 Start DiskBalancer on datanode(s): ozone-ha-datanode-5{code}
{code:java}
2025-09-15 11:42:17 2025-09-15 06:12:17,121 [IPC Server handler 23 on default 
port 9860] INFO node.DiskBalancerManager: Sending diskBalancerCommand: 
opType=UPDATE, configuration=Disk Balancer Configuration values: 2025-09-15 
11:42:17 Key Value 2025-09-15 11:42:17 Threshold 2.0E-4 2025-09-15 11:42:17 Max 
disk bandwidth 320 2025-09-15 11:42:17 Parallel Thread 10 2025-09-15 11:42:17 
Stop After Disk Even true 2025-09-15 11:42:17 to Datanode 
2e88e335-e13d-4897-9e0e-e58559e4fcb3(ozone-ha-datanode-5.ozone-ha_default/172.18.0.15)
 2025-09-15 11:42:17 2025-09-15 06:12:17,269 
[scm1-EventQueue-PipelineReportForPipelineReportHandler] INFO 
pipeline.PipelineReportHandler: Opened pipeline 
Pipeline-5f663f25-79c4-4ab1-907e-3b45d51658a3 2025-09-15 11:42:22 2025-09-15 
06:12:22,538 [IPC Server handler 13 on default port 9860] INFO 
node.DiskBalancerManager: Sending diskBalancerCommand: opType=START, 
configuration=Disk Balancer Configuration values: 2025-09-15 11:42:22 Key Value 
2025-09-15 11:42:22 Threshold 10.0 2025-09-15 11:42:22 Max disk bandwidth 10 
2025-09-15 11:42:22 Parallel Thread 5 2025-09-15 11:42:22 Stop After Disk Even 
true 2025-09-15 11:42:22 to Datanode 
2e88e335-e13d-4897-9e0e-e58559e4fcb3(ozone-ha-datanode-5.ozone-ha_default/172.18.0.15){code}
Above is the case when *addDiskBalancerReport* was adding report to 
heartbeat.Now after removing and using {*}DiskBalancerReportPublisher{*}, 
update and start command try to change same configurations then each command is 
honoured.
Since *DiskBalancerReportPublisher* is periodic so "{_}This delay creates a 
time window where the pending update command can finish applying the new 
configuration. By the time the next report is generated, it will likely reflect 
the correct, updated state, effectively masking the temporary inconsistency 
from the SCM.{_}"
This is supported by this scm logs after changes:
{code:java}
2025-09-15 11:59:21 2025-09-15 06:29:21,558 
[7756dbe9-9594-48a7-8281-cdd50db75a26-server-thread1] INFO 
server.RaftServerConfigKeys: raft.server.snapshot.creation.gap = 1024 (default) 
2025-09-15 11:59:28 2025-09-15 06:29:28,792 [IPC Server handler 0 on default 
port 9860] INFO node.DiskBalancerManager: Sending diskBalancerCommand: 
opType=UPDATE, configuration=Disk Balancer Configuration values: 2025-09-15 
11:59:28 Key Value 2025-09-15 11:59:28 Threshold 2.0E-4 2025-09-15 11:59:28 Max 
disk bandwidth 320 2025-09-15 11:59:28 Parallel Thread 6 2025-09-15 11:59:28 
Stop After Disk Even true 2025-09-15 11:59:28 to Datanode 
bdf51d24-1db5-4089-a90d-846ccac7d54d(ozone-ha-datanode-5.ozone-ha_default/172.18.0.13)
 2025-09-15 11:59:35 2025-09-15 06:29:35,037 [IPC Server handler 2 on default 
port 9860] INFO node.DiskBalancerManager: Sending diskBalancerCommand: 
opType=START, configuration=Disk Balancer Configuration values: 2025-09-15 
11:59:35 Key Value 2025-09-15 11:59:35 Threshold 2.0E-4 2025-09-15 11:59:35 Max 
disk bandwidth 320 2025-09-15 11:59:35 Parallel Thread 6 2025-09-15 11:59:35 
Stop After Disk Even true 2025-09-15 11:59:35 to Datanode 
bdf51d24-1db5-4089-a90d-846ccac7d54d(ozone-ha-datanode-5.ozone-ha_default/172.18.0.13){code}

After these changes update command is honoured correctly.

> [DiskBalancer] Configuration Changes Not Reflected in DiskBalancer Status 
> Report After Update on Specific DataNode
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDDS-13645
>                 URL: https://issues.apache.org/jira/browse/HDDS-13645
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Elavarasan Kathirvel
>            Assignee: Gargi Jaiswal
>            Priority: Critical
>
> *Description:*
> After updating the configuration for a specific DataNode and starting the 
> DiskBalancer job, the changes are not reflected in the status report.
> *Steps to Reproduce:*
>  # Update the configuration on a specific DataNode.
>  # Start the DiskBalancer job for that DataNode.
>  # Check the status report.
> *Expected Result:*
> The updated configuration should be reflected in the status report once the 
> DiskBalancer job starts.
> *Actual Result:*
> The configuration changes do not appear in the status report, indicating that 
> the update may not have been applied or recognized.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDDS-13645) [DiskBalancer] Configuration Changes Not Reflected in DiskBalancer Status Report After Update on Specific DataNode

Reply via email to