Dmitry Konstantinov created CASSANDRA-20333:
-----------------------------------------------

             Summary: Reduce DecayingEstimatedHistogramReservoir update cost
                 Key: CASSANDRA-20333
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20333
             Project: Apache Cassandra
          Issue Type: Improvement
          Components: Observability/Metrics
            Reporter: Dmitry Konstantinov


Based on the discussions in CASSANDRA-20250

[~benedict]:
{quote}We can probably improve our reservoir performance if we want to, perhaps 
in a follow-up patch? For instance, we could have a small thread-local buffer 
of (time, latency) pairs that we periodically flush together, so that we 
amortise the memory latency costs. Or we could explore maintaining a per-thread 
HdrHistogram, that we periodically flush. This would be a good time to explore 
fully migrating to HdrHistogram, as it has built-in merge semantics iirc. I am 
not sure what the decayed version would look like there, but I am certain we 
could maintain a separate decayed HdrHistogram.
Having a thread-local buffer of updates we intend to flush to the histograms 
would amortise the latency penalties without fundamentally redesigning anything 
(as well as reducing contention).

Other possibilities might include e.g. changing the bucket distribution so we 
don't need a LUT for computing lg2, although the above would gracefully handle 
any contribution this has as well.
{quote}
 

Other ideas about squeezing extra bits from the current design:
 * bucket id can be calculated once (currently we do it 2 times for decaying 
and current buckets), like:

{code:java}
int stripe = (int) (Thread.currentThread().getId() & (nStripes - 1));
int bucket = stripedIndex(index, stripe);
rescaledDecayingBuckets.update(bucket, now);
updateBucket(buckets, bucket, 1); {code}
 * for histograms on highly loaded paths we can use another number of stripes 
(by default it is 2, we can set for example 4 for them)
 * I noticed some variation in performance for a micro-benchmark (existing one: 
DecayingEstimatedHistogramBench) depending on what exact value for 
distributionPrime is used (but I need to double check it)
 * forwardDecayWeight function depends on SampledClock value, so we can try to 
recalculate the weight only when time is changed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to