[ 
https://issues.apache.org/jira/browse/CASSANDRA-17153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494597#comment-17494597
 ] 

Andres de la Peña commented on CASSANDRA-17153:
-----------------------------------------------

Here is the patch adding the proposed guardrails:
||PR||CI||
|[trunk|https://github.com/apache/cassandra/pull/1459]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/1308/workflows/20d6dd7f-a453-4a47-84d8-f5fc85d3aa66]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/1308/workflows/19736dab-b07f-4fe1-8cac-ab62eec9aa78]|

Unfortunately we cannot know the data size nor the number of items of a 
non-frozen collection at write time, because part of the collection could have 
already been written, and we don't want to do read before write. Thus, the 
suggested approach is:
 * At write time the guardrails only check the size of the collection fragment 
that is written, without checking for any additional fragments of the 
collection that might be already stored. This check is done for both frozen and 
not-frozen collections.
 * At sstable write time the guardrails are checked again for every non-frozen 
collection, so we can detect collections over the thresholds when they are 
written to disk or compacted. In this case the action taken by the guardrail is 
just emitting a warn/error log message. There isn't a warning message because 
this happens asychronously, and no exception is risen because there isn't a 
specific query to abort and we don't want to interrupt the process triggering 
the guardrail.

As for testing, the tests for each new guardrail are split between unit tests 
and dtests. The unit tests are used for the query-time guardrail check, and the 
dtests are used for the flush/compact-time check. Dtests are used because they 
have utilities that allow us to easily check the logs and because we wan't to 
check that those log messages are printed in replicas, and not only on the 
coordinator.

> Guardrails for collection items and size
> ----------------------------------------
>
>                 Key: CASSANDRA-17153
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17153
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Feature/Guardrails
>            Reporter: Andres de la Peña
>            Assignee: Andres de la Peña
>            Priority: Normal
>
> Add guardrails for the number of items and size of collections. For example:
> {code}
> # Guardrail to warn or fail when encountering larger size of collection data 
> than threshold.
> # The two thresholds default to 0KiB to disable.
> collection_size_warn_threshold: 0KiB
> collection_size_fail_threshold: 0KiB
> # Guardrail to warn or fail when encountering more elements in collection 
> than threshold.
> # The two thresholds default to -1 to disable.
> items_per_collection_warn_threshold: -1
> items_per_collection_fail_threshold: -1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to