zjxxzjwang opened a new pull request, #25874:
URL: https://github.com/apache/pulsar/pull/25874

   <!--
   ### Contribution Checklist
   
     - PR title format should be *[type][component] summary*. For details, see 
*[Guideline - Pulsar PR Naming 
Convention](https://pulsar.apache.org/contribute/develop-semantic-title/)*.
   
     - Fill out the template below to describe the changes contributed by the 
pull request. That will give reviewers the context they need to do the review.
   
     - Each pull request should address only one issue, not mix up code from 
multiple issues.
   
     - Each commit in the pull request has a meaningful commit message
   
     - Once all items of the checklist are addressed, remove the above text and 
this checklist, leaving only the filled out template below.
   -->
   
   ### Motivation
   
   `DataSketchesSummaryLogger#registerEvent(long eventLatency, TimeUnit unit)` 
normalizes the input latency into milliseconds before feeding it into the 
quantile sketch / count / sum, which are eventually exposed as a Prometheus 
summary metric. The current implementation is:
   
   ```java
   double valueMillis = unit.toMicros(eventLatency) / 1000.0;
   ```
   
   This two-step scaling has two issues:
   
   1. **Sub-microsecond precision loss.** `TimeUnit.toMicros` returns a `long` 
and performs integer truncation. For example, when the caller passes `(1500, 
TimeUnit.NANOSECONDS)` (i.e. 1.5 µs), `toMicros` truncates the value to `1`, 
and the final result becomes `0.001 ms` instead of the real `0.0015 ms` — a 
~33% loss. For high-precision call sites that use `NANOSECONDS`, this 
systematically biases the quantile metrics downward.
   2. **Higher overflow risk.** `TimeUnit.toMicros` saturates to 
`Long.MAX_VALUE` on overflow. The two-step scaling makes the saturation 
boundary easier to hit for large inputs (e.g. values expressed in `DAYS`) than 
a direct nanosecond conversion would.
   
   In addition, the variable is named `valueMillis`, but the implementation 
goes through microseconds first and then divides by 1000, which makes the 
intent harder to read for reviewers.
   
   ### Modifications
   
   In 
`pulsar-broker/src/main/java/org/apache/pulsar/broker/stats/prometheus/metrics/DataSketchesSummaryLogger.java`,
 change the conversion in `registerEvent` from "to micros, then divide by 1000" 
to "to nanos, then divide by 1_000_000.0", so that the floating-point division 
preserves sub-microsecond precision:
   
   ```java
   public void registerEvent(long eventLatency, TimeUnit unit) {
       // Convert via nanoseconds to keep sub-microsecond precision and reduce 
overflow risk.
       double valueMillis = unit.toNanos(eventLatency) / 1_000_000.0;
       ...
   }
   ```
   
   Behavioral notes:
   - For inputs expressed in `MILLISECONDS` / `MICROSECONDS` / `NANOSECONDS`, 
the result is identical to the previous implementation, except that 
sub-microsecond inputs are now reported with their real fractional value 
instead of being truncated.
   - For inputs expressed in `SECONDS` and coarser units, `TimeUnit.toNanos` 
still applies the standard `Long`-range saturation, so the behavior is 
equivalent to the previous implementation in the saturation case.
   - No caller signature, configuration option, metric name, label, or 
reporting path is changed.
   
   ### Verifying this change
   
   - [x] Make sure that the change passes the CI checks.
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   > This is an internal numerical-conversion fix. The external behavior of the 
summary metric (name, labels, reporting cadence, caller signatures) is 
unchanged; it only makes the quantiles / sum more accurate for sub-microsecond 
inputs. Existing tests around `DataSketchesSummaryLogger` and the summary 
metrics path already cover this code.
   
   ### Does this pull request potentially affect one of the following parts:
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   *If the box was checked, please highlight the changes*
   
   - [ ] Dependencies (add or upgrade a dependency)
   - [ ] The public API
   - [ ] The schema
   - [ ] The default values of configurations
   - [ ] The threading model
   - [ ] The binary protocol
   - [ ] The REST endpoints
   - [ ] The admin CLI options
   - [ ] The metrics
   - [ ] Anything that affects deployment
   
   > Note: "The metrics" is intentionally left unchecked. This PR does not add, 
remove, or rename any metric, and does not change the semantics of any metric. 
It only improves numerical accuracy for sub-microsecond inputs of an existing 
summary, which is a bug-fix-style correction rather than a metric-contract 
change.
   
   ---


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to