Sharing some more thoughts. We could alternatively use two topics
instead of one. In this design, the first topic is the unfiltered
write ahead log that represents many writers (brokers) trying to
acquire ownership of bundles. The second topic is the distilled log
that represents the "winners" or the "owners" of the bundles. There is
a single writer, the leader broker, that reads from the input topic
and writes to the output topic. The first topic is normal and the
second is compacted.

The primary benefit in a two topic solution is that it is easy for the
leader broker to trade off ownership without needing to slow down
writes to the input topic. The leader broker will start consuming from
the input topic when it has fully consumed the table view on the
output topic. In general, I don't think consumers know when they have
"reached the end of a table view", but we should be able to trivially
figure this out if we are the topic's only writer and the topic and
writer are collocated on the same broker.

In that design, it might make sense to use something like the
replication cursor to keep track of this consumer's state.

- Michael

On Fri, Oct 28, 2022 at 5:12 PM Michael Marshall <mmarsh...@apache.org> wrote:
>
> Thanks for your proposal, Heesung.
>
> Fundamentally, we have the problems listed in this PIP because we have
> multiple writers instead of just one writer. Can we solve this problem
> by changing our write pattern? What if we use the leader broker as the
> single writer? That broker would intercept attempts to acquire
> ownership on bundles and would grant ownership to the first broker to
> claim an unassigned bundle. It could "grant ownership" by letting the
> first write to claim an unassigned bundle get written to the ownership
> topic. When a bundle is already owned, the leader won't persist that
> event to the bookkeeper. In this design, the log becomes a true
> ownership log, which will correctly work with the existing topic
> compaction and table view solutions. My proposal essentially moves the
> conflict resolution to just before the write, and as a consequence, it
> greatly reduces the need for post processing of the event log. One
> trade off might be that the leader broker could slow down the write
> path, but given that the leader would just need to verify the current
> state of the bundle, I think it'd be performant enough.
>
> Additionally, we'd need the leader broker to be "caught up" on bundle
> ownership in order to grant ownership of topics, but unless I am
> mistaken, that is already a requirement of the current PIP 192
> paradigm.
>
> Below are some additional thoughts that will be relevant if we move
> forward with the design as it is currently proposed.
>
> I think it might be helpful to update the title to show that this
> proposal will also affect table view as well. I didn't catch that at
> first.
>
> Do you have any documentation describing how the
> TopicCompactionStrategy will determine which states are valid in the
> context of load balancing? I looked at
> https://github.com/apache/pulsar/pull/18195, but I couldn't seem to
> find anything for it. That would help make this proposal less
> abstract.
>
> The proposed API seems very tied to the needs of PIP 192. For example,
> `isValid` is not a term I associate with topic compaction. The
> fundamental question for compaction is which value to keep (or build a
> new value). I think we might be able to simplify the API by replacing
> the "isValid", "isMergeEnabled", and "merge" methods with a single
> method that lets the implementation handle one or all tasks. That
> would also remove the need to deserialize payloads multiple times too.
>
> I also feel like mentioning that after working with the PIP 105 broker
> side filtering, I think we should avoid running UDFs in the broker as
> much as possible. (I do not consider the load balancing logic to be a
> UDF here.) I think it would be worth not making this a user facing
> feature unless there is demand for real use cases.
>
> Thanks!
> Michael
>
> On Fri, Oct 28, 2022 at 1:21 AM 丛搏 <bog...@apache.org> wrote:
> >
> > +1(non-binding)
> >
> > thanks,
> > bo
> >
> > Heesung Sohn <heesung.s...@streamnative.io.invalid> 于2022年10月19日周三 07:54写道:
> > >
> > > Hi pulsar-dev community,
> > >
> > > I raised a pip to discuss : PIP-215: Configurable Topic Compaction 
> > > Strategy
> > >
> > > PIP link: https://github.com/apache/pulsar/issues/18099
> > >
> > > Regards,
> > > Heesung

Reply via email to