[ 
https://issues.apache.org/jira/browse/HDDS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Gui updated HDDS-5394:
---------------------------
    Description: 
After HDDS-5268, datanode data volumes and ratis volumes are checked in a 
single periodic volume checker together.

But actually, data volumes and ratis volumes are checked in 2 separated 
`checkAllVolumes` calls, the `checkAllVolumes` will check whether 2 successive 
calls are executed within a time gap controlled by 'disk.check.min.gap', then 
ratis volumes are always skipped.

To fix it we could put the check in `checkAllVolumeSets` which check volume 
sets in a single pass one by one.

And there is a another problem, there are 2 volume checkers implemented in 
datanode:
 * Periodic Volume Checker
 * On-demand Volume Checker(HDDS-5089)

The periodic volume checker is scheduled at fixed rate, 15 mins by default, but 
'disk.check.min.gap' is also 15 mins by default and it also controls the time 
gap of 2 successive checks for a single volume. So within the 15 mins between 2 
periodic checks, no on-demand check could happen.

To fix it we could make the 'periodic.disk.check.interval.minutes' longer, such 
as 1 hour, since we have the on-demand disk checker, this should be fine.

> Fix skipped volume check due to disk.check.min.gap
> --------------------------------------------------
>
>                 Key: HDDS-5394
>                 URL: https://issues.apache.org/jira/browse/HDDS-5394
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Mark Gui
>            Assignee: Mark Gui
>            Priority: Major
>
> After HDDS-5268, datanode data volumes and ratis volumes are checked in a 
> single periodic volume checker together.
> But actually, data volumes and ratis volumes are checked in 2 separated 
> `checkAllVolumes` calls, the `checkAllVolumes` will check whether 2 
> successive calls are executed within a time gap controlled by 
> 'disk.check.min.gap', then ratis volumes are always skipped.
> To fix it we could put the check in `checkAllVolumeSets` which check volume 
> sets in a single pass one by one.
> And there is a another problem, there are 2 volume checkers implemented in 
> datanode:
>  * Periodic Volume Checker
>  * On-demand Volume Checker(HDDS-5089)
> The periodic volume checker is scheduled at fixed rate, 15 mins by default, 
> but 'disk.check.min.gap' is also 15 mins by default and it also controls the 
> time gap of 2 successive checks for a single volume. So within the 15 mins 
> between 2 periodic checks, no on-demand check could happen.
> To fix it we could make the 'periodic.disk.check.interval.minutes' longer, 
> such as 1 hour, since we have the on-demand disk checker, this should be fine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to