Copilot commented on code in PR #24377:
URL: https://github.com/apache/pulsar/pull/24377#discussion_r2366953481


##########
pip/pip-424.md:
##########
@@ -0,0 +1,59 @@
+# PIP-424: Support Getting Message ID by Index in ManagedLedger Interface
+
+# Background knowledge
+
+- **ManagedLedger**: Core Pulsar component managing message storage in 
BookKeeper ledgers
+- **Message Index**: Global monotonically increasing counter for messages in a 
topic (0-based)

Review Comment:
   Define the semantics precisely: does the index count individual messages 
within batches or per-entry? Is it scoped per-partition or across a partitioned 
topic? Are control/transaction/compaction markers included? How do retention, 
deletion, and offload affect continuity? Clarifying these is critical for 
correctness and interoperability.
   ```suggestion
   - **Message Index**: A monotonically increasing counter (0-based) that 
uniquely identifies each user message within a single partition of a topic.
     - The index counts individual user messages, not batch entries or broker 
control/transaction/compaction markers.
     - The index is scoped per partition; each partition maintains its own 
independent sequence of indexes.
     - Retention, deletion, and offload do not affect the continuity of the 
index: once assigned, indexes are never reused, and gaps may exist if messages 
are deleted or offloaded.
   ```



##########
pip/pip-424.md:
##########
@@ -0,0 +1,59 @@
+# PIP-424: Support Getting Message ID by Index in ManagedLedger Interface
+
+# Background knowledge
+
+- **ManagedLedger**: Core Pulsar component managing message storage in 
BookKeeper ledgers
+- **Message Index**: Global monotonically increasing counter for messages in a 
topic (0-based)
+- **LedgerInfo**: Metadata stored per ledger in ZK/metastore (ledgerID, size, 
etc.)
+
+# Motivation
+
+Current approaches to get message ID by index rely on **broker entry 
metadata** (PIP-415), but custom ManagedLedger 
+implementations (like StreamNative's Ursa engine) may not use broker entry 
metadata for tracking message indexes.
+
+So it is necessary to add `getMessageIdByIndex()` in the **ManagedLedger** 
interface directly.
+
+# Goals
+
+## In Scope
+
+- Add `getMessageIdByIndex()` to `ManagedLedger`
+
+## Out of Scope
+
+NA
+
+# High Level Design
+
+1. **Custom Engine Support**: Alternative implementations via `ManagedLedger` 
interface
+
+# Detailed Design
+
+## Design & Implementation Details
+
+## Public-facing Changes
+
+**1. `ManagedLedger` Interface Change**
+
+```java
+public interface ManagedLedger {
+    CompletableFuture<MessageId> getMessageIdByIndex(long index);

Review Comment:
   ManagedLedger is an internal storage-layer API that typically uses 
Position/PositionImpl, not the client-facing MessageId type. Returning 
MessageId introduces a layering dependency on the client API and may not fit 
custom engines; prefer returning Position (or an internal identifier) and let 
higher layers translate to MessageId. Suggested signature: 
CompletableFuture<Position> getPositionByIndex(long index).
   ```suggestion
       CompletableFuture<Position> getPositionByIndex(long index);
   ```



##########
pip/pip-424.md:
##########
@@ -0,0 +1,59 @@
+# PIP-424: Support Getting Message ID by Index in ManagedLedger Interface
+
+# Background knowledge
+
+- **ManagedLedger**: Core Pulsar component managing message storage in 
BookKeeper ledgers
+- **Message Index**: Global monotonically increasing counter for messages in a 
topic (0-based)
+- **LedgerInfo**: Metadata stored per ledger in ZK/metastore (ledgerID, size, 
etc.)
+
+# Motivation
+
+Current approaches to get message ID by index rely on **broker entry 
metadata** (PIP-415), but custom ManagedLedger 
+implementations (like StreamNative's Ursa engine) may not use broker entry 
metadata for tracking message indexes.
+
+So it is necessary to add `getMessageIdByIndex()` in the **ManagedLedger** 
interface directly.
+
+# Goals
+
+## In Scope
+
+- Add `getMessageIdByIndex()` to `ManagedLedger`
+
+## Out of Scope
+
+NA
+
+# High Level Design
+
+1. **Custom Engine Support**: Alternative implementations via `ManagedLedger` 
interface
+
+# Detailed Design
+
+## Design & Implementation Details
+
+## Public-facing Changes
+
+**1. `ManagedLedger` Interface Change**
+
+```java
+public interface ManagedLedger {
+    CompletableFuture<MessageId> getMessageIdByIndex(long index);
+}
+```
+
+### Public API
+
+### Binary protocol
+
+### Configuration
+
+### CLI
+
+### Metrics
+
+# Backward & Forward Compatibility
+

Review Comment:
   This section is empty. Please document how adding a new method to the 
ManagedLedger interface will avoid breaking existing implementations (e.g., 
introduce a default method with UnsupportedOperationException, or a new 
sub-interface like IndexedManagedLedger and capability detection), and outline 
migration steps.
   ```suggestion
   
   ## Avoiding Breaking Changes
   
   Adding a new method to the `ManagedLedger` interface can break existing 
implementations that do not provide an implementation for the new method. To 
avoid this:
   
   - **Default Method:** The new `getMessageIdByIndex(long index)` method will 
be added as a `default` method in the interface, which throws 
`UnsupportedOperationException` by default. This ensures that existing 
implementations will continue to compile and run, but will throw an exception 
if the new method is called and not overridden.
   
     ```java
     default CompletableFuture<MessageId> getMessageIdByIndex(long index) {
         throw new UnsupportedOperationException("getMessageIdByIndex is not 
implemented");
     }
     ```
   
   - **Alternative: Sub-interface:** Alternatively, a new sub-interface (e.g., 
`IndexedManagedLedger`) can be introduced for implementations that support this 
feature. Capability detection can be used to check if an instance supports the 
new method.
   
   ## Migration Steps
   
   1. **Update Interface:** Add the default method to the `ManagedLedger` 
interface.
   2. **Update Implementations:** Implementers of `ManagedLedger` can override 
the new method to provide the actual logic. Existing implementations that do 
not override the method will throw `UnsupportedOperationException` if the 
method is called.
   3. **Client Code:** Client code should check for 
`UnsupportedOperationException` when calling the new method, or use capability 
detection if a sub-interface is used.
   ```



##########
pip/pip-424.md:
##########
@@ -0,0 +1,59 @@
+# PIP-424: Support Getting Message ID by Index in ManagedLedger Interface
+
+# Background knowledge
+
+- **ManagedLedger**: Core Pulsar component managing message storage in 
BookKeeper ledgers
+- **Message Index**: Global monotonically increasing counter for messages in a 
topic (0-based)
+- **LedgerInfo**: Metadata stored per ledger in ZK/metastore (ledgerID, size, 
etc.)
+
+# Motivation
+
+Current approaches to get message ID by index rely on **broker entry 
metadata** (PIP-415), but custom ManagedLedger 
+implementations (like StreamNative's Ursa engine) may not use broker entry 
metadata for tracking message indexes.
+
+So it is necessary to add `getMessageIdByIndex()` in the **ManagedLedger** 
interface directly.
+
+# Goals
+
+## In Scope
+
+- Add `getMessageIdByIndex()` to `ManagedLedger`
+
+## Out of Scope
+
+NA
+
+# High Level Design
+
+1. **Custom Engine Support**: Alternative implementations via `ManagedLedger` 
interface
+
+# Detailed Design
+
+## Design & Implementation Details
+
+## Public-facing Changes
+
+**1. `ManagedLedger` Interface Change**
+
+```java
+public interface ManagedLedger {
+    CompletableFuture<MessageId> getMessageIdByIndex(long index);
+}
+```
+
+### Public API
+
+### Binary protocol
+
+### Configuration
+
+### CLI
+
+### Metrics
+
+# Backward & Forward Compatibility
+
+# Links
+
+* Mailing List discussion thread:
+* Mailing List voting thread:

Review Comment:
   Please add the required discussion and voting thread links to comply with 
the PIP process.
   ```suggestion
   * Mailing List discussion thread: 
https://lists.apache.org/thread/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
   * Mailing List voting thread: 
https://lists.apache.org/thread/yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
   ```



##########
pip/pip-424.md:
##########
@@ -0,0 +1,59 @@
+# PIP-424: Support Getting Message ID by Index in ManagedLedger Interface
+
+# Background knowledge
+
+- **ManagedLedger**: Core Pulsar component managing message storage in 
BookKeeper ledgers
+- **Message Index**: Global monotonically increasing counter for messages in a 
topic (0-based)
+- **LedgerInfo**: Metadata stored per ledger in ZK/metastore (ledgerID, size, 
etc.)
+
+# Motivation
+
+Current approaches to get message ID by index rely on **broker entry 
metadata** (PIP-415), but custom ManagedLedger 
+implementations (like StreamNative's Ursa engine) may not use broker entry 
metadata for tracking message indexes.
+
+So it is necessary to add `getMessageIdByIndex()` in the **ManagedLedger** 
interface directly.
+
+# Goals
+
+## In Scope
+
+- Add `getMessageIdByIndex()` to `ManagedLedger`
+
+## Out of Scope
+
+NA
+
+# High Level Design
+
+1. **Custom Engine Support**: Alternative implementations via `ManagedLedger` 
interface
+
+# Detailed Design
+
+## Design & Implementation Details
+

Review Comment:
   The design sections are too sparse. Please outline how index→position 
mapping will be built and queried efficiently (e.g., ledger-level index ranges 
cached in memory and persisted in LedgerInfo for O(log N) lookup), expected 
complexity, memory overhead, impact on writes/rollovers, and recovery strategy 
(building/rebuilding index metadata after failures).
   ```suggestion
   
   ### Index→Position Mapping
   
   - **Index Range Tracking:**  
     Each BookKeeper ledger managed by ManagedLedger will be associated with a 
contiguous range of message indexes, e.g., `[startIndex, endIndex)`. These 
ranges will be tracked in memory and persisted as part of the `LedgerInfo` 
metadata for each ledger.
   
   - **LedgerInfo Extension:**  
     The `LedgerInfo` structure will be extended to include the starting global 
message index for the ledger. The end index can be inferred from the next 
ledger's start index or the current highest index.
   
   - **In-Memory Cache:**  
     On startup, ManagedLedger will load all `LedgerInfo` objects and build an 
in-memory sorted list (or array) of `(startIndex, ledgerId)` pairs. This 
enables efficient binary search (O(log N), where N is the number of ledgers) to 
map a global message index to the corresponding ledger.
   
   - **Querying:**  
     To resolve `getMessageIdByIndex(index)`, perform a binary search over the 
in-memory index range list to find the ledger containing the index, then 
compute the entryId within the ledger as `entryId = index - startIndex`. The 
resulting `MessageId` is `(ledgerId, entryId)`.
   
   - **Complexity:**  
     - Lookup: O(log N) for binary search over ledgers, O(1) arithmetic for 
entryId.
     - Memory: O(N) for the in-memory index range list, where N is the number 
of ledgers (typically small).
   
   - **Impact on Writes/Rollovers:**  
     - On ledger rollover, the new ledger's `LedgerInfo` is created with the 
next available global index as its start index.
     - The in-memory index range list is updated atomically with the new ledger.
     - No per-message overhead is introduced; only per-ledger metadata is 
updated.
   
   - **Persistence:**  
     - The start index for each ledger is persisted in `LedgerInfo` in the 
metadata store (e.g., ZooKeeper).
     - This ensures that the mapping can be reconstructed after broker restarts 
or failures.
   
   - **Recovery Strategy:**  
     - On recovery, the broker loads all `LedgerInfo` objects and reconstructs 
the in-memory index range list.
     - If index metadata is missing or inconsistent (e.g., after an unclean 
shutdown), the system can scan the ledgers sequentially, counting entries to 
reconstruct the correct index ranges, and update the `LedgerInfo` accordingly.
   
   - **Custom Engine Support:**  
     - Alternative `ManagedLedger` implementations (e.g., Ursa) can provide 
their own efficient mapping and recovery strategies, but must expose the same 
API and semantics.
   ```



##########
pip/pip-424.md:
##########
@@ -0,0 +1,59 @@
+# PIP-424: Support Getting Message ID by Index in ManagedLedger Interface
+
+# Background knowledge
+
+- **ManagedLedger**: Core Pulsar component managing message storage in 
BookKeeper ledgers
+- **Message Index**: Global monotonically increasing counter for messages in a 
topic (0-based)
+- **LedgerInfo**: Metadata stored per ledger in ZK/metastore (ledgerID, size, 
etc.)
+
+# Motivation
+
+Current approaches to get message ID by index rely on **broker entry 
metadata** (PIP-415), but custom ManagedLedger 
+implementations (like StreamNative's Ursa engine) may not use broker entry 
metadata for tracking message indexes.
+
+So it is necessary to add `getMessageIdByIndex()` in the **ManagedLedger** 
interface directly.
+
+# Goals
+
+## In Scope
+
+- Add `getMessageIdByIndex()` to `ManagedLedger`
+
+## Out of Scope
+
+NA
+
+# High Level Design
+
+1. **Custom Engine Support**: Alternative implementations via `ManagedLedger` 
interface
+
+# Detailed Design
+
+## Design & Implementation Details
+
+## Public-facing Changes
+
+**1. `ManagedLedger` Interface Change**
+
+```java
+public interface ManagedLedger {
+    CompletableFuture<MessageId> getMessageIdByIndex(long index);
+}
+```
+
+### Public API
+
+### Binary protocol
+
+### Configuration
+
+### CLI
+
+### Metrics
+

Review Comment:
   These public-facing sections are placeholders with no content. Either 
explicitly state 'No changes' for each area or document the expected impacts 
(e.g., any new exceptions, client API mapping, metrics for index lookups, and 
whether CLI/admin endpoints will expose this capability).
   ```suggestion
   No changes
   ### Binary protocol
   No changes
   ### Configuration
   No changes
   ### CLI
   No changes
   ### Metrics
   No changes
   ```



##########
pip/pip-424.md:
##########
@@ -0,0 +1,59 @@
+# PIP-424: Support Getting Message ID by Index in ManagedLedger Interface
+
+# Background knowledge
+
+- **ManagedLedger**: Core Pulsar component managing message storage in 
BookKeeper ledgers
+- **Message Index**: Global monotonically increasing counter for messages in a 
topic (0-based)
+- **LedgerInfo**: Metadata stored per ledger in ZK/metastore (ledgerID, size, 
etc.)
+
+# Motivation
+
+Current approaches to get message ID by index rely on **broker entry 
metadata** (PIP-415), but custom ManagedLedger 
+implementations (like StreamNative's Ursa engine) may not use broker entry 
metadata for tracking message indexes.
+
+So it is necessary to add `getMessageIdByIndex()` in the **ManagedLedger** 
interface directly.
+
+# Goals
+
+## In Scope
+
+- Add `getMessageIdByIndex()` to `ManagedLedger`
+
+## Out of Scope
+
+NA

Review Comment:
   Use 'N/A' instead of 'NA' for consistency and clarity, or list specific 
out-of-scope items.
   ```suggestion
   N/A
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to