[ 
https://issues.apache.org/jira/browse/CASSANDRA-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17519696#comment-17519696
 ] 

Andres de la Peña edited comment on CASSANDRA-17150 at 4/8/22 4:47 PM:
-----------------------------------------------------------------------

The main motivation for this guardrail is rejecting user writes when the disk 
is so stuffed that a compaction or streaming can completely fill the disk. So, 
we artificially make user writes fail with the guardrail to avoid a later real 
failure of internal writes due to a full disk. 

That way, if we estimate that our compaction strategy can duplicate the size of 
the data directories, we would define a disk usage threshold below 50%. If we 
don't define {{data_disk_usage_max_disk_size}} the calculation will be based on 
the amount of free space. Conversely, if we define 
{{data_disk_usage_max_disk_size}} and set the guardrail to 100%, that should 
work as a "how much data Cassandra can hold" guardrail.

The way of calculating the usage ratio based on disk used vs. free space only 
worked well if the data directories were alone on the partition. If there is 
any other data on it {{data_disk_usage_max_disk_size}} worked as a global disk 
usage limit, and the threshold percentages were hard to correlate with the 
growing of Cassandra data since they were including other data on disk.

I have modified that calculation of disk usage to use the actual size of the 
data directories, instead of the size of all data on disk. The usage ratio is 
then obtained dividing that value by the sum of that same value plus the 
available space on disk. So its {{data_directories_size/(data_directories_size 
+ free_space)}}, or just 
{{data_directories_size/data_disk_usage_max_disk_size}} if that property is 
manually specified. It should allow to use both the free space on disk or a 
fixed size in a more predictable way.

||PR||CI||
|[trunk|https://github.com/apache/cassandra/pull/1546]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1467/workflows/92bc517c-c9dd-4ca2-9d80-c649bda65e07]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/1467/workflows/8d588f25-6ad5-44a5-ad96-f45a37b2db08]|


was (Author: adelapena):
The main motivation for this guardrail is rejecting user writes when the disk 
is so stuffed that a compaction or streaming can completely fill the disk. So, 
we artificially make user writes fail with the guardrail to avoid a later real 
failure of internal writes due to a full disk. 

That way, if we estimate that our compaction strategy can duplicate the size of 
the data directories, we would define a disk usage threshold below 50%. If we 
don't define {{data_disk_usage_max_disk_size}} the calculation will be based on 
the amount of free space. Conversely, if we define 
{{data_disk_usage_max_disk_size}} and set the guardrail to 100%, that should 
work as a "how much data Cassandra can hold" guardrail.

The way of calculating the usage ratio based on disk used vs. free space only 
worked well if the data directories were alone on the partition. If there is 
any other data on it {{data_disk_usage_max_disk_size}} worked as a global disk 
usage limit, and the threshold percentages were hard to correlate with the 
growing of Cassandra data since they were including other data on disk.

I have modified that calculation of disk usage to use the actual size of the 
data directories, instead of the size of all data on disk. The usage ratio is 
then obtained dividing that value by the sum of that same value plus the 
available space on disk. So its {{data_directories_size/(data_directories_size 
+ free_space)}}, or just 
{{data_directories_size/data_disk_usage_max_disk_size}} if that property is 
manually specified. It should allow to use both the free space on disk or a 
fixed size in a more predictable way.

> Guardrails for disk usage
> -------------------------
>
>                 Key: CASSANDRA-17150
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17150
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Feature/Guardrails
>            Reporter: Andres de la Peña
>            Assignee: Andres de la Peña
>            Priority: Normal
>             Fix For: 4.x
>
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Add guardrails for disk usage establishing soft/hard limits on the percentage 
> of used disk space. For example:
> {code}
> # Warning threshold to warn when local disk usage exceeds threshold. Valid 
> values: (1, 100]
> # Defaults to -1 to disable.
> # disk_usage_percentage_warn_threshold: -1
> # Failure threshold to reject write requests if replica disk usage exceeds 
> threshold. Valid values: (1, 100]
> # Defaults to -1 to disable.
> # disk_usage_percentage_failure_threshold: -1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to