[
https://issues.apache.org/jira/browse/HDDS-13091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Rose updated HDDS-13091:
------------------------------
Description:
If using a time-based sliding window to determine volume failure, a few configs
need to be set accordingly:
* Minimum scan gap between repeated volume scans.
** This is currently set to 10 minutes, which is probably too high given that
volume scans are cheap.
* The interval that the background volume scanner runs
** This is currently set to 1 hour.
* The bandwidth of the container data scanner
** This affects the rate at which it can scan containers.
** Each unhealthy container will trigger an on-demand volume scan.
** This is currently set to 5mb/sec which means we can expect one container
scan result every 17 minutes.
* The number of failed checks over a fixed time interval that is required to
fail a volume.
** This number must be set such that volume scans triggered either by the
background volume scanner or unhealthy containers from the container scanner
have a chance to mark the volume as failed.
was:
If using a time-based sliding window to determine volume failure, a few configs
need to be set accordingly:
* Minimum scan gap between repeated volume scans.
** This is currently set to 15 minutes, which is probably too high given that
volume scans are cheap.
* The interval that the background volume scanner runs
** This is currently set to 1 hour.
* The bandwidth of the container data scanner
** This affects the rate at which it can scan containers.
** Each unhealthy container will trigger an on-demand volume scan.
** This is currently set to 5mb/sec which means we can expect one container
scan result every 17 minutes.
* The number of failed checks over a fixed time interval that is required to
fail a volume.
** This number must be set such that volume scans triggered either by the
background volume scanner or unhealthy containers from the container scanner
have a chance to mark the volume as failed.
> Improve default configurations for container and volume scan windows and
> intervals
> ----------------------------------------------------------------------------------
>
> Key: HDDS-13091
> URL: https://issues.apache.org/jira/browse/HDDS-13091
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Ethan Rose
> Priority: Major
>
> If using a time-based sliding window to determine volume failure, a few
> configs need to be set accordingly:
> * Minimum scan gap between repeated volume scans.
> ** This is currently set to 10 minutes, which is probably too high given
> that volume scans are cheap.
> * The interval that the background volume scanner runs
> ** This is currently set to 1 hour.
> * The bandwidth of the container data scanner
> ** This affects the rate at which it can scan containers.
> ** Each unhealthy container will trigger an on-demand volume scan.
> ** This is currently set to 5mb/sec which means we can expect one container
> scan result every 17 minutes.
> * The number of failed checks over a fixed time interval that is required to
> fail a volume.
> ** This number must be set such that volume scans triggered either by the
> background volume scanner or unhealthy containers from the container scanner
> have a chance to mark the volume as failed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]