wolfstudy commented on PR #25774:
URL: https://github.com/apache/pulsar/pull/25774#issuecomment-4466761361

   > Thanks @wolfstudy for the proposal.
   > 
   > **My main concern is architectural rather than technical: I'd like this to 
build on the storage-layer pluggability introduced by 
[PIP-384](https://github.com/apache/pulsar/blob/master/pip/pip-384.md) rather 
than adding this complexity to core Pulsar.**
   > 
   > PIP-384 deliberately decoupled `ManagedLedgerStorage` and introduced 
`ManagedLedgerStorageClass` so that the BookKeeper client is owned by a storage 
class, and a broker can hold multiple storage classes — each topic resolving to 
one of them via persistence policy. The framing PIP-384 establishes is "this 
topic lives on this storage backend." The core essence of PIP-477 — moving 
writes to a new BK cluster while preserving readability of historical data on 
the old one — fits naturally as an extension of that framing: **a topic can 
move between storage classes, with per-ledger attribution determining which 
storage class serves each historical read**.
   > 
   > Concretely, I think the path forward looks like this:
   > 
   > 1. **Implement PIP-477 first as a custom set of implementations of the 
existing interfaces, outside of core Pulsar.** A custom `ManagedLedgerStorage` 
can hold a `Map<clusterName, BookkeeperManagedLedgerStorageClass>` and expose 
the active one — this covers what `DynamicBookKeeperClientFactory` does. The 
orchestrator state machine, the Broker-ZK registry under 
`/admin/bookie-clusters/*`, the watcher, and the CLI/REST surface can live in 
the plugin module. The REST endpoints under `/admin/v2/bookie-clusters/*` can 
be exposed via 
[`AdditionalServlet`](https://github.com/apache/pulsar/blob/master/pulsar-broker-common/src/main/java/org/apache/pulsar/broker/web/plugin/servlet/AdditionalServlet.java)
 without touching core REST classes.
   > 2. **The per-ledger attribution slot already exists in core.** 
[PIP-404](https://github.com/apache/pulsar/blob/master/pip/pip-404.md) added 
`repeated KeyValue properties = 6` to `LedgerInfo` and exposed 
`ManagedLedger#asyncAddLedgerProperty` / `asyncRemoveLedgerProperty` / 
`asyncGetLedgerProperty` precisely as a generic extension slot for plugins. The 
`★ new` `repeated KeyValue properties = 7` proposed in §"proto Changes" of 
PIP-477 is duplicating what's already there — `bookieClusterName` can be 
encoded as a property in the existing `LedgerInfo.properties` slot today, with 
no core proto change. `ManagedCursorInfo` similarly already has 
`cursorProperties` (a `repeated StringProperty` at tag 8) which can carry the 
same attribution.
   > 3. **The CLI surface is also already pluggable.** 
[PIP-201](https://github.com/apache/pulsar/blob/master/pip/pip-201.md) added 
the `CustomCommandFactory` / `CustomCommandGroup` / `CustomCommand` SPI (in 
`pulsar-client-tools-api`), bundled as a `.nar`, so `pulsar-admin 
bookie-clusters …` can be shipped entirely in the plugin with no `pulsar-admin` 
core change.
   > 4. **Where there are real SPI gaps, add narrow hooks — don't inline the 
feature.** A custom `ManagedLedgerFactory` returned by the storage class 
already gets you most of the way; the remainder should be small, focused 
additions, narrower than PIP-477 and useful to anyone writing a custom storage 
class.
   > 5. **If the plugin proves valuable to other users of Pulsar later, it can 
be moved into the Pulsar tree** as a separate module shipped alongside the 
broker, opt-in via configuration. The ~99% of users who never use this feature 
pay zero cost in core surface area, the orchestrator code can evolve on its own 
cadence.
   > 
   > This approach also provides a forcing function for a cleaner abstraction. 
PIP-384 currently assumes "one topic, one storage class for its lifetime." 
Generalizing to "one topic, one storage class at a time, with 
attribution-driven routing for previously-written ledgers" is a natural and 
reusable extension — useful well beyond the BK-cluster-switching scenario 
(e.g., per-topic storage-tier migration, gradual rollout of an alternative 
`ManagedLedgerStorage` implementation).
   > 
   > Would you be open to reframing the PIP along these lines?
   > 
   > Happy to discuss further on the dev list or here.
   
   Thanks for the thorough review. I agree to reframe PIP-470 along the PIP-384 
extension axis:
   
   - **Drop the new proto fields**. Encode bookieClusterName into 
LedgerInfo.properties (already added by PIP-404) and 
ManagedCursorInfo.cursorProperties (already exists). The remaining open 
question is SchemaStorageFormat.PositionInfo, where no equivalent properties 
slot exists today — I'll either propose a tiny additive slot or fold 
attribution into SchemaLocator properties as a follow-up.
   
   - **Ship as a plugin NAR module** (pulsar-bookie-cluster-switching) that 
provides a custom ManagedLedgerStorage holding a Map<clusterName, 
BookkeeperManagedLedgerStorageClass>. Active cluster selection, ZK registry 
under /admin/bookie-clusters/*, watcher, and orchestrator 
(BUILD/PROMOTE/CLEANUP) all live in the plugin.
   
   - **REST surface** moves to AdditionalServlet; CLI moves to PIP-201 
CustomCommandFactory NAR.
   
   - **Narrow core hooks** I still need (these would be useful to any custom 
storage class, not just BK switching):
   
      - ManagedLedgerConfig.activeBookKeeperSupplier — per-call active BK 
resolution on the write path.
      - BookkeeperSchemaStorage.schemaReadBookKeeperResolver — analogous 
read-path resolution for schema ledgers.
      - An owner-broker-callable hook for cursor CAS under the 
ManagedCursorImpl lock (or expose it as an internal endpoint via 
AdditionalServlet).
   - **Opt-in via configuration**; the ~99% of users that don't switch BK 
clusters pay zero cost in core.
   
   I'll update the PIP draft accordingly and post the revised version on the 
dev list.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to