[jira] [Updated] (HDDS-13091) Improve default configurations for container and volume scan windows and intervals

Ethan Rose (Jira) Thu, 07 Aug 2025 07:39:04 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-13091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ethan Rose updated HDDS-13091:
------------------------------
    Description: 
If using a time-based sliding window to determine volume failure, a few configs 
need to be set accordingly:
 * Minimum scan gap between repeated volume scans.
 ** This is currently set to 10 minutes, which is probably too high given that 
volume scans are cheap.
 * The interval that the background volume scanner runs
 ** This is currently set to 1 hour.
 * The bandwidth of the container data scanner
 ** This affects the rate at which it can scan containers.
 ** Each unhealthy container will trigger an on-demand volume scan.
 ** This is currently set to 5mb/sec which means we can expect one container 
scan result every 17 minutes.
 * The number of failed checks over a fixed time interval that is required to 
fail a volume.
 ** This number must be set such that volume scans triggered either by the 
background volume scanner or unhealthy containers from the container scanner 
have a chance to mark the volume as failed.

  was:
If using a time-based sliding window to determine volume failure, a few configs 
need to be set accordingly:
* Minimum scan gap between repeated volume scans.
** This is currently set to 15 minutes, which is probably too high given that 
volume scans are cheap.
* The interval that the background volume scanner runs
** This is currently set to 1 hour.
* The bandwidth of the container data scanner
** This affects the rate at which it can scan containers.
** Each unhealthy container will trigger an on-demand volume scan.
** This is currently set to 5mb/sec which means we can expect one container 
scan result every 17 minutes.
* The number of failed checks over a fixed time interval that is required to 
fail a volume.
** This number must be set such that volume scans triggered either by the 
background volume scanner or unhealthy containers from the container scanner 
have a chance to mark the volume as failed.


> Improve default configurations for container and volume scan windows and 
> intervals
> ----------------------------------------------------------------------------------
>
>                 Key: HDDS-13091
>                 URL: https://issues.apache.org/jira/browse/HDDS-13091
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Ethan Rose
>            Priority: Major
>
> If using a time-based sliding window to determine volume failure, a few 
> configs need to be set accordingly:
>  * Minimum scan gap between repeated volume scans.
>  ** This is currently set to 10 minutes, which is probably too high given 
> that volume scans are cheap.
>  * The interval that the background volume scanner runs
>  ** This is currently set to 1 hour.
>  * The bandwidth of the container data scanner
>  ** This affects the rate at which it can scan containers.
>  ** Each unhealthy container will trigger an on-demand volume scan.
>  ** This is currently set to 5mb/sec which means we can expect one container 
> scan result every 17 minutes.
>  * The number of failed checks over a fixed time interval that is required to 
> fail a volume.
>  ** This number must be set such that volume scans triggered either by the 
> background volume scanner or unhealthy containers from the container scanner 
> have a chance to mark the volume as failed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-13091) Improve default configurations for container and volume scan windows and intervals

Reply via email to