[
https://issues.apache.org/jira/browse/HDDS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Gui updated HDDS-5394:
---------------------------
Description:
After HDDS-5268, datanode data volumes and ratis volumes are checked in a
single periodic volume checker together.
But actually, data volumes and ratis volumes are checked in 2 separated
`checkAllVolumes` calls, the `checkAllVolumes` will check whether 2 successive
calls are executed within a time gap controlled by 'disk.check.min.gap', then
ratis volumes are always skipped.
To fix it we could put the check in `checkAllVolumeSets` which check volume
sets in a single pass one by one.
And there is a another problem, there are 2 volume checkers implemented in
datanode:
* Periodic Volume Checker
* On-demand Volume Checker(HDDS-5089)
The periodic volume checker is scheduled at fixed rate, 15 mins by default, but
'disk.check.min.gap' is also 15 mins by default and it also controls the time
gap of 2 successive checks for a single volume. So within the 15 mins between 2
periodic checks, no on-demand check could happen.
To fix it we could make the 'periodic.disk.check.interval.minutes' longer, such
as 1 hour, since we have the on-demand disk checker, this should be fine.
> Fix skipped volume check due to disk.check.min.gap
> --------------------------------------------------
>
> Key: HDDS-5394
> URL: https://issues.apache.org/jira/browse/HDDS-5394
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Mark Gui
> Assignee: Mark Gui
> Priority: Major
>
> After HDDS-5268, datanode data volumes and ratis volumes are checked in a
> single periodic volume checker together.
> But actually, data volumes and ratis volumes are checked in 2 separated
> `checkAllVolumes` calls, the `checkAllVolumes` will check whether 2
> successive calls are executed within a time gap controlled by
> 'disk.check.min.gap', then ratis volumes are always skipped.
> To fix it we could put the check in `checkAllVolumeSets` which check volume
> sets in a single pass one by one.
> And there is a another problem, there are 2 volume checkers implemented in
> datanode:
> * Periodic Volume Checker
> * On-demand Volume Checker(HDDS-5089)
> The periodic volume checker is scheduled at fixed rate, 15 mins by default,
> but 'disk.check.min.gap' is also 15 mins by default and it also controls the
> time gap of 2 successive checks for a single volume. So within the 15 mins
> between 2 periodic checks, no on-demand check could happen.
> To fix it we could make the 'periodic.disk.check.interval.minutes' longer,
> such as 1 hour, since we have the on-demand disk checker, this should be fine.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]