Dmitry Konstantinov created CASSANDRA-20333:
-----------------------------------------------
Summary: Reduce DecayingEstimatedHistogramReservoir update cost
Key: CASSANDRA-20333
URL: https://issues.apache.org/jira/browse/CASSANDRA-20333
Project: Apache Cassandra
Issue Type: Improvement
Components: Observability/Metrics
Reporter: Dmitry Konstantinov
Based on the discussions in CASSANDRA-20250
[~benedict]:
{quote}We can probably improve our reservoir performance if we want to, perhaps
in a follow-up patch? For instance, we could have a small thread-local buffer
of (time, latency) pairs that we periodically flush together, so that we
amortise the memory latency costs. Or we could explore maintaining a per-thread
HdrHistogram, that we periodically flush. This would be a good time to explore
fully migrating to HdrHistogram, as it has built-in merge semantics iirc. I am
not sure what the decayed version would look like there, but I am certain we
could maintain a separate decayed HdrHistogram.
Having a thread-local buffer of updates we intend to flush to the histograms
would amortise the latency penalties without fundamentally redesigning anything
(as well as reducing contention).
Other possibilities might include e.g. changing the bucket distribution so we
don't need a LUT for computing lg2, although the above would gracefully handle
any contribution this has as well.
{quote}
Other ideas about squeezing extra bits from the current design:
* bucket id can be calculated once (currently we do it 2 times for decaying
and current buckets), like:
{code:java}
int stripe = (int) (Thread.currentThread().getId() & (nStripes - 1));
int bucket = stripedIndex(index, stripe);
rescaledDecayingBuckets.update(bucket, now);
updateBucket(buckets, bucket, 1); {code}
* for histograms on highly loaded paths we can use another number of stripes
(by default it is 2, we can set for example 4 for them)
* I noticed some variation in performance for a micro-benchmark (existing one:
DecayingEstimatedHistogramBench) depending on what exact value for
distributionPrime is used (but I need to double check it)
* forwardDecayWeight function depends on SampledClock value, so we can try to
recalculate the weight only when time is changed
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]