codelipenghui opened a new pull request, #24726:
URL: https://github.com/apache/pulsar/pull/24726

   ## Summary
   
   This PR implements broker-level metrics for tracking non-recoverable data 
skips in Apache Pulsar, providing operational visibility for data loss events.
   
   ### Key Features Added
   
   - **NonRecoverableDataMetricsCallback Interface**: New callback interface in 
managed-ledger module to avoid cross-module dependencies
   - **ManagedLedger Integration**: Updated 
`ManagedLedgerImpl.skipNonRecoverableLedger()` to trigger callback when ledgers 
are skipped
   - **ManagedCursor Integration**: Updated 
`ManagedCursorImpl.skipNonRecoverableEntries()` to count and report skipped 
entries
   - **BrokerService Configuration**: Automatic callback setup during managed 
ledger creation for topics
   - **Dual Metrics Support**: Both Prometheus and OpenTelemetry metrics 
implementation
   
   ### New Metrics
   
   1. **`pulsar.broker.non.recoverable.ledgers.skipped.count`** - Tracks the 
number of entire ledgers skipped due to non-recoverable issues
   2. **`pulsar.broker.non.recoverable.entries.skipped.count`** - Tracks the 
number of individual entries skipped due to non-recoverable issues
   
   ### Design Decisions
   
   - **Always Available**: Metrics work regardless of 
`ManagedLedgerInterceptor` configuration state
   - **Callback Pattern**: Uses callback interface to avoid introducing 
dependencies between managed-ledger and broker modules
   - **Automatic Setup**: Callbacks are automatically configured by 
`BrokerService` during topic creation
   - **Null-Safe**: All implementations handle null callback gracefully
   
   ### Testing Coverage
   
   - **7 Unit Tests**: Comprehensive testing of callback integration in 
`NonRecoverableDataCallbackTest`
   - **3 Integration Tests**: End-to-end validation including real 
topic/subscription creation in `OpenTelemetryBrokerOperabilityStatsTest`
   - **Checkstyle Clean**: All code follows Apache Pulsar coding standards
   
   ### Implementation Details
   
   The implementation uses a callback pattern where:
   1. `BrokerService` sets up the `NonRecoverableDataMetricsCallback` during 
managed ledger configuration
   2. `ManagedLedgerImpl` calls the callback when `skipNonRecoverableLedger()` 
is invoked
   3. `ManagedCursorImpl` calls the callback when `skipNonRecoverableEntries()` 
is invoked with actual entry counts
   4. The callback updates `BrokerOperabilityMetrics` which maintains both 
Prometheus and OpenTelemetry counters
   
   ## Test plan
   
   - [x] Unit tests pass (7/7 in managed-ledger module)
   - [x] Integration tests pass (3/3 in broker module) 
   - [x] Checkstyle validation passes (0 violations)
   - [x] End-to-end testing with real topics, subscriptions, and message 
positions
   - [x] Callback integration verified through actual broker service topic 
creation
   
   🤖 Generated with [Claude Code](https://claude.ai/code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to