Caleb Rackliffe created CASSANDRA-15907:
-------------------------------------------

             Summary: Operational Improvements & Hardening for Replica 
Filtering Protection
                 Key: CASSANDRA-15907
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15907
             Project: Cassandra
          Issue Type: Improvement
          Components: Consistency/Coordination, Feature/2i Index
            Reporter: Caleb Rackliffe


CASSANDRA-8272 uses additional space on the heap to ensure correctness for 2i 
and filtering queries at consistency levels above ONE/LOCAL_ONE. There are a 
few things we should follow up on, however, to make life a bit easier for 
operators and generally de-risk usage:

(Note: Line numbers are based on {{trunk}} as of 
{{3cfe3c9f0dcf8ca8b25ad111800a21725bf152cb}}.)

*Minor Optimizations*

* {{ReplicaFilteringProtection:114}} - Given we size them up-front, we may be 
able to use simple arrays instead of lists for {{rowsToFetch}} and 
{{originalPartitions}}. Alternatively (or also), we may be able to null out 
references in these two collections more aggressively. (ex. Using 
{{ArrayList#set()}} instead of {{get()}} in {{queryProtectedPartitions()}}, 
assuming we pass {{toFetch}} as an argument to {{querySourceOnKey()}}.)
* {{ReplicaFilteringProtection:323}} - We may be able to use 
{{EncodingStats.merge()}} and remove the custom {{stats()}} method.
* {{DataResolver:111 & 228}} - Cache an instance of 
{{UnaryOperator#identity()}} instead of creating one on the fly.
* 

*Documentation and Intelligibility*

* There are a few places (CHANGES.txt, tracing output in 
{{ReplicaFilteringProtection}}, etc.) where we mention "replica-side filtering 
protection" (which makes it seem like the coordinator doesn't filter) rather 
than "replica filtering protection" (which sounds more like what we actually 
do, which is protect ourselves against incorrect replica filtering results). 
It's a minor fix, but would avoid confusion.
* The method call chain in {{DataResolver}} might be a bit simpler if we put 
the {{repairedDataTracker}} in {{ResolveContext}}.

*Guardrails*

* As it stands, we don't have a way to enforce an upper bound on the memory 
usage of {{ReplicaFilteringProtection}} which caches row responses from the 
first round of requests. (Remember, these are later used to merged with the 
second round of results to complete the data for filtering.) Operators will 
likely need a way to protect themselves, i.e. simply fail queries if they hit a 
particular threshold rather than GC nodes into oblivion. The fun question is 
how we do this, with the primary axes being scope (per-query, global, etc.) and 
granularity (per-partition, per-row, per-cell, actual heap usage, etc.). My 
starting disposition   on the right trade-off between performance/complexity 
and accuracy is having something along the lines of cached rows per query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to