codelipenghui opened a new pull request, #24726: URL: https://github.com/apache/pulsar/pull/24726
## Summary This PR implements broker-level metrics for tracking non-recoverable data skips in Apache Pulsar, providing operational visibility for data loss events. ### Key Features Added - **NonRecoverableDataMetricsCallback Interface**: New callback interface in managed-ledger module to avoid cross-module dependencies - **ManagedLedger Integration**: Updated `ManagedLedgerImpl.skipNonRecoverableLedger()` to trigger callback when ledgers are skipped - **ManagedCursor Integration**: Updated `ManagedCursorImpl.skipNonRecoverableEntries()` to count and report skipped entries - **BrokerService Configuration**: Automatic callback setup during managed ledger creation for topics - **Dual Metrics Support**: Both Prometheus and OpenTelemetry metrics implementation ### New Metrics 1. **`pulsar.broker.non.recoverable.ledgers.skipped.count`** - Tracks the number of entire ledgers skipped due to non-recoverable issues 2. **`pulsar.broker.non.recoverable.entries.skipped.count`** - Tracks the number of individual entries skipped due to non-recoverable issues ### Design Decisions - **Always Available**: Metrics work regardless of `ManagedLedgerInterceptor` configuration state - **Callback Pattern**: Uses callback interface to avoid introducing dependencies between managed-ledger and broker modules - **Automatic Setup**: Callbacks are automatically configured by `BrokerService` during topic creation - **Null-Safe**: All implementations handle null callback gracefully ### Testing Coverage - **7 Unit Tests**: Comprehensive testing of callback integration in `NonRecoverableDataCallbackTest` - **3 Integration Tests**: End-to-end validation including real topic/subscription creation in `OpenTelemetryBrokerOperabilityStatsTest` - **Checkstyle Clean**: All code follows Apache Pulsar coding standards ### Implementation Details The implementation uses a callback pattern where: 1. `BrokerService` sets up the `NonRecoverableDataMetricsCallback` during managed ledger configuration 2. `ManagedLedgerImpl` calls the callback when `skipNonRecoverableLedger()` is invoked 3. `ManagedCursorImpl` calls the callback when `skipNonRecoverableEntries()` is invoked with actual entry counts 4. The callback updates `BrokerOperabilityMetrics` which maintains both Prometheus and OpenTelemetry counters ## Test plan - [x] Unit tests pass (7/7 in managed-ledger module) - [x] Integration tests pass (3/3 in broker module) - [x] Checkstyle validation passes (0 violations) - [x] End-to-end testing with real topics, subscriptions, and message positions - [x] Callback integration verified through actual broker service topic creation 🤖 Generated with [Claude Code](https://claude.ai/code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
