lhotari commented on PR #25706: URL: https://github.com/apache/pulsar/pull/25706#issuecomment-4406950463
> Based on the discussion so far, I've restructured the **Alternatives** section of the PIP. The key change is separating the analysis into distinct layers: > > 1. **Problem Classification (P1–P5)**: First, explicitly enumerate the cascading problems that arise when a Consumer is stuck under Key_Shared, and classify which ones this PIP **must** solve versus which can be mitigated through other means. I believe we should align on this before debating solutions — not every problem needs to be solved in this PIP, and a scheme should not be rejected simply because it doesn't address secondary concerns. This is helpful. I guess this is also about drilling down deeper into the actual problem and causes. > 2. **Core Problem Resolution Table**: A side-by-side comparison of all alternatives (Overflow ML + A through E) against each problem, making it clear what each scheme does and does not solve. Describing tradeoffs is useful. Expressing them in a table can be challenging, since explaining what each decision is trading off often requires more detail than a table can hold. That said, a table can still work as a high-level summary. Some characteristics of tradeoffs, such as "complexity", can be highly opinionated. One person's "complexity" might mean something quite different to someone else. In many cases, complexity is just anything unfamiliar, which isn't a sound way to qualify designs. In distributed systems, anything that increases the number of distributed interactions adds real complexity to the solution. > 3. **Cost Table**: What each alternative introduces in terms of extra storage, I/O, sustained hot-key viability, and client transparency. The impact on reliability and the risk of incidents are also important factors to weigh as part of "cost". > 4. **Summary**: A concise evaluation of why Overflow ML and Auxiliary Cursor are the only two schemes that address both P1 and P2, and the trade-offs between them. Deciding on the solution too early has always been a problem with the Pulsar PIP process. Ideally, discussions and interactions would happen in the early stages, while everyone is still open to different solutions. Otherwise, it easily turns into a situation where people feel they need to defend their point of view, and the synergy that a team can find together isn't leveraged. What could as a result of this that a real problem doesn't get solved at all due to lack of consensus and initially motivated contributors turning away. Every decision and solution is a tradeoff. Making decisions reversible or solutions optional is extremely useful to reduce risk and avoiding increased maintenance overhead. Instead of adding more conditionals to the existing PersistentDispatcherMultipleConsumers/PersistentStickyKeyDispatcherMultipleConsumers classes, it would be preferable to introduce method hooks so that another, more specialized implementation can plug in. That would avoid piling more maintenance overhead onto the existing classes when a special-case implementation such as the overflow ML is added to the codebase. Another possibility is to add configuration so that a third-party implementation can be used for Key_Shared or Shared subscription dispatchers. One way for you to make progress regardless of what gets decided about PIP-474 is to add this possibility — refactor the dispatcher classes with hooks so that the implementation can be extended, and add configuration so that the broker can use a third-party dispatcher implementation. Such a change could be handled in a separate PIP, and it would also be useful for other use cases. This type of solution would also be a good fit for implementing PIP-474 as a separate dispatcher implementation initially and have it later be merged into Pulsar when it's a proven solution. We could possibly also backport the "pluggable dispatcher" solution to maintenance releases since the risk of it causing regressions is relatively low. That way you could solve your problem about hot/slow keys as soon as possible and gain real production feedback of the implementation. > This structure also has a practical benefit for PR review: each point (problem classification, per-problem resolution, cost dimension) can be discussed independently in its own comment thread, rather than getting lost in a single long-form reply. > > The updated PIP is ready for review. I'd suggest we start by aligning on the problem classification (the P1–P5 table) — once we agree on what must be solved, the solution comparison follows naturally Makes sense. I'd just suggest taking a look at the "pluggable dispatcher" idea and focusing on that first in a separate PIP. That would let you make progress quickly without getting stalled by the decisions in this PIP. The "pluggable dispatcher" could also be considered a prerequisite for this PIP. Besides the dispatcher itself, there might be a need to extend the interface so that the dispatcher can expose implementation-specific statistics or metrics. It might also touch other areas, which could broaden the scope enough to support something like the PIP-474 implementation. Covering "overflow ML" could require several other plugin interfaces so that the lifecycle of the ML can be tied to the subscription and the topic. Adding such interfaces deserves more attention. Coupling the "overflow ML" directly to other internals wouldn't be a great idea. Pulsar has multiple listener interfaces, but they likely don't cover every need. One way to plug into Pulsar today is by implementing a TopicFactory (configurable via topicFactoryClassName). Implementing a TopicFactory and creating a custom subclass of PersistentTopic is one way to hook into the required extension points from custom plugins. Many hooks might be missing for easily extending up to the broker side Consumer level if that happens to be required. A simpler approach would obviously be to have lifecycle interfaces, so that it wouldn't be necessary to go through the pain of implementing a TopicFactory, a custom PersistentTopic, and so on, just to get notified about topic and subscription deletions so the "overflow ML" can be cleaned up. Looking forward to your opinion about first focusing on the "pluggable dispatcher" part. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
