[ 
https://issues.apache.org/jira/browse/CASSANDRA-19958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886482#comment-17886482
 ] 

Stefan Miklosovic commented on CASSANDRA-19958:
-----------------------------------------------

[~chovatia.jayd...@gmail.com]  dont we want to control this in runtime? I can 
imagine to have an mbean method on StorageService / StorageProxy so we could 
control the resizing of that dynamically if one sees it necessary. By lowering 
the thread pool dynamically, we could basically "throttle" hints submissions 
which means that we might prioritize other operations. If we are in a hurry and 
we want to submit all hints as fast as possible, we might set it higher so all 
is written sooner.

> Local Hints are stepping on local mutations
> -------------------------------------------
>
>                 Key: CASSANDRA-19958
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19958
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Legacy/Local Write-Read Paths
>            Reporter: Jaydeepkumar Chovatia
>            Priority: Normal
>         Attachments: image-2024-09-26-15-28-20-435.png
>
>
> Cassandra uses the same queue (Stage.MUTATION) to process local mutations as 
> well as local hints writing. CASSANDRA-19534 has enhanced and added timeouts 
> for local mutations, but local hint writing does not honor that timeout by 
> design as it honors a different timeout, i.e. _max_hint_window_in_ms_
>  
> *The Problem*
> Let's understand the problem by having five nodes Cassandra cluster N1, N2, 
> N3, N4, N5 with the following configuration:
>  * concurrent_writes{_}:{_}10
>  * native_transport_timeout: 5s 
>  * write_request_timeout_in_ms: 2000 //2 seconds
> .
> +StorageProxy.java snippet...+
>  
> !image-2024-09-26-15-28-20-435.png|width=600,height=200!
>  
> Let's assume N4 and N5 are slow flapping or down. Assume N1 receives a flurry 
> of mutations, so this is what happens on N1:
>  # Line no 1542: Append 100 hints to the Stage.Mutation queue 
>  # Line no 1547: Append 100 local mutations to the Stage.Mutation queue 
>  Stage.Mutation queue on N1 would look as follows:
> {code:java}
> hint1,hint2,hint3,....hint100,mutation1,mutaiton2,....mutation100 {code}
>  * Assume hints runnable takes 1 second, then it will take 10 seconds to 
> process 100 hints, and only after that will local mutation be processed. 
>  
> So, in production, it would look like N1 is inactive for almost 10 seconds as 
> it is just writing hints locally and not participating in any Quorum, etc.
>  
> The problem becomes really huge if, let's say, the load is high, and if hints 
> pile up to 1M, then N1 will choke. The only solution at this time is to 
> involve an operator that will restart N1 to drain all the piled-up hints from 
> the Stage.Mutation queue.
>  
> The reason above problem happens is because local hint writing and local 
> mutation are both using the same Queue, i.e., Stage.Mutation.
> Local mutation writing is in the hot path. However, a slight local hint 
> writing delay does not create a big trouble.
>  
> *Reproducible steps*
>  # Pull the latest 4.1.x release
>  # Create a 5-node cluster
>  # Set the following configuration
> {code:java}
> native_transport_timeout: 10s
> write_request_timeout_in_ms: 2000
> enforce_native_deadline_for_hints: true{code}
>  # Inject 1s of latency inside the following API in _StorageProxy.java_ on 
> all five nodes
>  # 
> {code:java}
> private static void performLocally(Stage stage, Replica localReplica, final 
> Runnable runnable, final RequestCallback<?> handler, Object description, 
> Dispatcher.RequestTime requestTime)
> {
>     stage.maybeExecuteImmediately(new LocalMutationRunnable(localReplica, 
> requestTime)
>     {
>         public void runMayThrow()
>         {
>             try
>             {
>                 Thread.sleep(1000); // Inject latency here
>                 runnable.run();
>                 handler.onResponse(null);
>             }
>             catch (Exception ex)
>             {
>                 if (!(ex instanceof WriteTimeoutException))
>                     logger.error("Failed to apply mutation locally : ", ex);
>                 handler.onFailure(FBUtilities.getBroadcastAddressAndPort(), 
> RequestFailureReason.forException(ex));
>             }
>         }
>         @Override
>         public String description()
>         {
>             // description is an Object and toString() called so we do not 
> have to evaluate the Mutation.toString()
>             // unless expliclitly checked
>             return description.toString();
>         }
>         @Override
>         protected Verb verb()
>         {
>             return Verb.MUTATION_REQ;
>         }
>     });
> } {code}
>  # Run write-only stress for 1 hour or so
>  # You will see Stage.Mutation queue will pile up to >1 million in size
>  # Stop the load
>  # Stage.Mutation will not be cleared immediately, and you cannot perform new 
> writes. Basically, at this time Cassandra cluster has become inoperable from 
> new mutations point-of-view. Only read will be served
>  
> *Solution*
> The solution is to segregate the local mutation queue and local hint writing 
> queue to address the problem above. Here is the PR: 
> [https://github.com/apache/cassandra/pull/3580]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to