[ 
https://issues.apache.org/jira/browse/CASSANDRA-19958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaydeepkumar Chovatia updated CASSANDRA-19958:
----------------------------------------------
    Summary: Local Hints are stepping on local mutations  (was: Hints are 
stepping on online mutations)

> Local Hints are stepping on local mutations
> -------------------------------------------
>
>                 Key: CASSANDRA-19958
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19958
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Legacy/Local Write-Read Paths
>            Reporter: Jaydeepkumar Chovatia
>            Priority: Normal
>         Attachments: image-2024-09-26-15-28-20-435.png
>
>
> Cassandra uses the same queue (Stage.MUTATION) to process local mutations as 
> well as local hints writing. CASSANDRA-19534 has enhanced and added timeouts 
> for local mutations, but local hint writing does not honor that timeout by 
> design as it honors a different timeout, i.e. _max_hint_window_in_ms_
>  
> *The Problem*
> Let's understand the problem by having five nodes Cassandra cluster N1, N2, 
> N3, N4, N5 with the following configuration:
>  * concurrent_writes{_}:{_}10
>  * native_transport_timeout: 5s 
>  * write_request_timeout_in_ms: 2000 //2 seconds
> .
> +StorageProxy.java snippet...+
>  
> !image-2024-09-26-15-28-20-435.png|height=200,width=600!
>  
> Let's assume N4 and N5 are slow flapping or down. Assume N1 receives a flurry 
> of mutations, so this is what happens on N1:
>  # Line no 1542: Append 100 hints to the Stage.Mutation queue 
>  # Line no 1547: Append 100 local mutations to the Stage.Mutation queue 
>  Stage.Mutation queue on N1 would look as follows:
> {code:java}
> hint1,hint2,hint3,....hint100,mutation1,mutaiton2,....mutation100 {code}
>  * Assume hints runnable takes 1 second, then it will take 10 seconds to 
> process 100 hints, and only after that will local mutation be processed. 
>  
> So, in production, it would look like N1 is inactive for almost 10 seconds as 
> it is just writing hints locally and not participating in any Quorum, etc.
>  
> The problem becomes really huge if, let's say, the load is high, and if hints 
> pile up to 1M, then N1 will choke. The only solution at this time is to 
> involve an operator that will restart N1 to drain all the piled-up hints from 
> the Stage.Mutation queue.
>  
> The reason above problem happens is because local hint writing and local 
> mutation are both using the same Queue, i.e., Stage.Mutation.
> Local mutation writing is in the hot path. However, a slight local hint 
> writing delay does not create a big trouble.
>  
> *Reproducible steps*
>  # Pull the latest 4.1.x release
>  # Create a 5-node cluster
>  # Set the following configuration
> {code:java}
> native_transport_timeout: 10s
> write_request_timeout_in_ms: 2000
> enforce_native_deadline_for_hints: true{code}
>  # Inject 1s of latency inside the following API in _StorageProxy.java_ on 
> all five nodes
>  # 
> {code:java}
> private static void performLocally(Stage stage, Replica localReplica, final 
> Runnable runnable, final RequestCallback<?> handler, Object description, 
> Dispatcher.RequestTime requestTime)
> {
>     stage.maybeExecuteImmediately(new LocalMutationRunnable(localReplica, 
> requestTime)
>     {
>         public void runMayThrow()
>         {
>             try
>             {
>                 Thread.sleep(1000); // Inject latency here
>                 runnable.run();
>                 handler.onResponse(null);
>             }
>             catch (Exception ex)
>             {
>                 if (!(ex instanceof WriteTimeoutException))
>                     logger.error("Failed to apply mutation locally : ", ex);
>                 handler.onFailure(FBUtilities.getBroadcastAddressAndPort(), 
> RequestFailureReason.forException(ex));
>             }
>         }
>         @Override
>         public String description()
>         {
>             // description is an Object and toString() called so we do not 
> have to evaluate the Mutation.toString()
>             // unless expliclitly checked
>             return description.toString();
>         }
>         @Override
>         protected Verb verb()
>         {
>             return Verb.MUTATION_REQ;
>         }
>     });
> } {code}
>  # Run write-only stress for 1 hour or so
>  # You will see Stage.Mutation queue will pile up to >1 million in size
>  # Stop the load
>  # Stage.Mutation will not be cleared immediately, and you cannot perform new 
> writes. Basically, at this time Cassandra cluster has become inoperable from 
> new mutations point-of-view. Only read will be served
>  
> *Solution*
> The solution is to segregate the local mutation queue and local hint writing 
> queue to address the problem above.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to