[ 
https://issues.apache.org/jira/browse/CASSANDRA-19365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883674#comment-17883674
 ] 

Caleb Rackliffe edited comment on CASSANDRA-19365 at 9/23/24 3:33 AM:
----------------------------------------------------------------------

Attached my CI results, which look good. Let me make a final pass at review 
this week. (I'm guessing porting down through to 4.0 won't involve a ton of 
changes the trunk patch...)


was (Author: maedhroz):
Attached my CI results, which look good. Let me make a final pass at review 
this week.

> invalid EstimatedHistogramReservoirSnapshot::getValue values due to race 
> condition in DecayingEstimatedHistogramReservoir
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-19365
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19365
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Observability/Metrics
>            Reporter: Jakub Zytka
>            Assignee: Maxim Muzafarov
>            Priority: Normal
>             Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>         Attachments: ci_summary.html, result_details.tar.gz
>
>          Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> `DecayingEstimatedHistogramReservoir` has a race condition between `update` 
> and `rescaleIfNeeded`.
> A sample which ends up (`update`) in an already scaled decayingBucket 
> (`rescaleIfNeeded`) may still use a non-scaled weight because `decayLandmark` 
> has not been updated yet at the moment of `update`.
>  
> The observed consequence was flooding of the cluster with speculative retries 
> (we happened to hit low-percentile buckets with overweight samples, which 
> drove p99 below true p50 for a long time).
> Please note that despite the manifestation being similar to CASSANDRA-19330, 
> these are two distinct bugs in their own right.
> This bug affects versions 4.0+
> On 3.11 there's locking in DEHR. I did not check earlier versions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to