Hi Penghui,

I added one comment below.

On Mon, Aug 21, 2023 at 6:12 AM PengHui Li <peng...@apache.org> wrote:

> > However, with this optional field approach, we don't need to introduce a
> new protocol version, which could be arguably beneficial.
> I will leave the decision to the community. Let me know if I have to use
> new commands and increase the protocolVersion for this change.
>
> I don't see any drawbacks of the optional fields.
> The protocolVersion changs usually happen when
> a new command is introduced. You must have a
> version check to decide if you can deliver that command
> from broker to client or from client to broker. Otherwise,
> the broker or client will receive an unknown command.
>
> For the newly introduced fields, it will break nothing.
> So I don't think the protocolVersion is required.
> And we should also avoid too many commands
> like CommandA_v1, CommandA_v2, CommandA_v3
> but the only difference is they have different optional
> fields.
>
> > Sorry, I am not sure if I understood your question here, but
> When the destination or source broker crashes in the middle,
> the leader will find the orphan state and clean the ownership by selecting
> a new owner, and the client will reconnect to it.
> During this transfer process, if alive, the source broker will serve the
> topic according to the protocol described in the PIP.
>
> If only the bundle owner will perform the bundle unloading.
> It's fine. If not, I mean only the leader knows how many bundles
> should be unloaded(after the owner changed).
> If the leader crashed, how the next leader
> knows how many bundles should be unloaded.
> I have confirmed with Kai, he said only the bundle owner will
> perform the bundle unloading. So, it looks fine.
>

Actually, this is incorrect. Currently, the leader broker globally triggers
unloading.
When the leader crahshes, the new leader broker will delay unloading
until it collects sufficient broker load data and bundle load data.


> This PIP is based on PIP-192 because the proposed protocol change is based
> on the bundle states from PIP-192.
> I assume the community could raise other PIPs to utilize the close producer
> and consumer command changes for other logic too.
>
> Yes, I just want to have a comprehensive view in addition to PIP-192.
> To check if it can also improve the topic lookup without PIP-192 because
> users can select different load balancers.
>
> One possible solution is the client side sends the lookup
> request with the authoritative field based on the new owner.
>
> Regards,
> Penghui
>
>
> On Fri, Aug 18, 2023 at 11:29 AM Michael Marshall <mmarsh...@apache.org>
> wrote:
>
> > > However, with this optional field approach, we don't need to introduce
> a
> > > new protocol version, which could be arguably beneficial.
> >
> > I don't see versioning the protocol as a negative as long as there is
> > sufficient benefit. In this case, the new command serves an explicit
> > purpose to make the protocol more reactive (the client knows what is
> > happening as it happens without any need to poll) and less wasteful
> > (no unnecessary data transfer).
> >
> > One reason I want to push on this design is because I think it will be
> > harder to change once it is in place. Adding the new broker URL to the
> > close producer and close consumer commands necessarily couples the
> > events. Using separate commands allows them to be decoupled, and in my
> > opinion, that matches the reality of the problem. We know we want to
> > close the producer/consumer before we know where they should connect
> > next.
> >
> > > I will leave the decision to the community.
> >
> > I do not speak for the community (hopefully that's obvious), but I am
> > a community member advocating for a slightly more complex solution
> > that, in my opinion, will solve one part of the transfer problem. :) I
> > don't think anyone is going to say "you have to use new commands". I
> > think the goal of this PIP discussion is to come to a consensus so we
> > can move forward confidently.
> >
> > > I will be replying to the PR.
> >
> > Thanks, that discussion has been very productive!
> >
> > Thanks,
> > Michael
> >
> > On Thu, Aug 10, 2023 at 7:29 PM Heesung Sohn
> > <heesung.s...@streamnative.io.invalid> wrote:
> > >
> > > On Tue, Aug 8, 2023 at 9:56 PM Michael Marshall <mmarsh...@apache.org>
> > > wrote:
> > >
> > > > Thanks for your proposal, Heesung. Sorry for my late response. Just
> > > > want to make a few points to hopefully improve the implementation.
> > > > Overall, I think this feature is absolutely the right direction for
> > > > Pulsar and will be a great improvement. I remember discussing this at
> > > > some community meetings last year. See the last paragraph of [0].
> > > >
> > > > In general, my primary question is about the protocol changes you're
> > > > proposing, both the commands and the way the commands are sent.
> > > >
> > > > > I am worried about the backward compatibility if we add a new proto
> > > > message
> > > > > -- how can the old versions of clients handle this new proto
> message?
> > > >
> > > > The protocolVersion [1] object in the proto definition is how we
> > > > guarantee backwards and forwards compatibility for all combinations
> of
> > > > brokers/clients. The client and the broker each share their version
> > > > during the handshake and then they only use commands known to both
> > > > peers.
> > > >
> > > > We can introduce a new command here if we want/need. The nuance is
> > > > that the server logic will branch depending on the client's
> > > > protocolVersion.
> > > >
> > >
> > > Good Point. We can introduce new commands here instead of the optional
> > > fields by branching the protocolVersion.
> > > If the community thinks the new commands are cleaner, I can create new
> > > commands for this, as the protocolVersion guarantees the
> > > backward-compatbility.
> > > However, with this optional field approach, we don't need to introduce
> a
> > > new protocol version, which could be arguably beneficial.
> > > I will leave the decision to the community. Let me know if I have to
> use
> > > new commands and increase the protocolVersion for this change.
> > >
> > >
> > >
> > > > Next, I am wondering if there are any opportunities to decrease the
> > > > latency even more than are proposed in the PIP by introducing some
> > > > concurrency to decouple operations like creating new connections and
> > > > initializing resources used by each topic (like a managed ledger).
> > > > Because I used a mermaid diagram, I put the majority of this point in
> > > > a PR comment [2].
> > > >
> > >
> > >  I will be replying to the PR.
> > >
> > >
> > > > Thanks,
> > > > Michael
> > > >
> > > > [0] https://lists.apache.org/thread/ghlsprmqgkt8lfv634groy93c5dm1k1g
> > > > [1]
> > > >
> >
> https://github.com/apache/pulsar/blob/15453964f6e6086b0c5e34736958cf749f16f5e1/pulsar-common/src/main/proto/PulsarApi.proto#L242-L268
> > > > [2]
> https://github.com/apache/pulsar/pull/20748#discussion_r1287927655
> > > >
> > > > On Fri, Aug 4, 2023 at 3:41 PM Heesung Sohn
> > > > <heesung.s...@streamnative.io.invalid> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > I updated the PIP to add more details about the recent discussion.
> > > > >
> > > > > Thanks,
> > > > > Heesung
> > > > >
> > > > > On Thu, Aug 3, 2023 at 6:46 PM Heesung Sohn <
> > > > heesung.s...@streamnative.io>
> > > > > wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Aug 2, 2023 at 8:02 PM PengHui Li <peng...@apache.org>
> > wrote:
> > > > > >
> > > > > >> > One of the advantages of this proposal (add these optional
> > > > dstBrokerUrl
> > > > > >> fields in the existing CommandCloseProducer and
> > CommandCloseConsumer)
> > > > is
> > > > > >> the backward compatibility
> > > > > >>
> > > > > >> Ah, yes. I missed this point. It's a good convince.
> > > > > >>
> > > > > >> > After the client connects to the destination broker, the next
> > > > > >> command(e.g.
> > > > > >> > ProducerCommand) requires
> > > > > >> > the destination broker to check the ownership again against
> its
> > > > local
> > > > > >> table
> > > > > >> > view of the bundle state channel.
> > > > > >>
> > > > > >> > Upon this local ownership check, there could be the following
> > > > scenarios:
> > > > > >>
> > > > > >> > Happy case:
> > > > > >> > - If the ownership state is `owned ` and the current broker is
> > > > indeed
> > > > > >> the
> > > > > >> > owner, the command completes.
> > > > > >> > Unhappy cases:
> > > > > >> > - If the ownership state is `owned ` and the current broker is
> > not
> > > > the
> > > > > >> > owner, the command fails
> > > > > >> > (the broker returns an error to the client), and the client
> > tries to
> > > > > >> find
> > > > > >> > the true new owner by lookups.
> > > > > >> > The global bundle state channel is eventually consistent, and
> > the
> > > > > >> lookups
> > > > > >> > should eventually converge.
> > > > > >> > - if the ownership change is still in progress(`releasing`,
> > > > > >> `assigning`),
> > > > > >> > this check will be deferred
> > > > > >> > until the state becomes `owned` with a timeout.
> > > > > >>
> > > > > >> Could you please also update the explanation in the proposal?
> > > > > >> It looks like what is guaranteed and what is not in the new
> > proposal.
> > > > > >> And why the mechanism can work with the unhappy case.
> > > > > >>
> > > > > >> Both the existing mechanism and the new mechanism follow the
> > > > > >> eventual consistency. The nuance is that the new mechanism will
> > change
> > > > > >> the bundle owner first and then unload the topics. For the
> broker
> > > > logs,
> > > > > >> will more error messages related to the fenced ledger be
> > introduced?
> > > > > >> Or the client reconnects will happen after the owner cache is
> > updated?
> > > > > >>
> > > > > >>
> > > > > > As described in the protocol change in the PIP, the goal is to
> have
> > > > fewer
> > > > > > fenced ledger errors because the protocol change helps the client
> > > > > > reconnection(open ledger)
> > > > > >  happen at the very end.
> > > > > >
> > > > > >
> > > > > >> If the load manager crashed before unloading the topics (close
> > > > producers
> > > > > >> and consumers).
> > > > > >> Will the old broker provide service for this topic until the
> > client
> > > > > >> triggers a
> > > > > >> new lookup?
> > > > > >>
> > > > > >
> > > > > > Sorry, I am not sure if I understood your question here, but
> > > > > > When the destination or source broker crashes in the middle,
> > > > > > the leader will find the orphan state and clean the ownership by
> > > > selecting
> > > > > > a new owner, and the client will reconnect to it.
> > > > > > During this transfer process, if alive, the source broker will
> > serve
> > > > the
> > > > > > topic according to the protocol described in the PIP.
> > > > > >
> > > > > >
> > > > > >> Sorry, Another question about PIP-192. Is it required by the new
> > > > > >> mechanism?
> > > > > >> Will the situation also be improved without enabling PIP-192?
> > > > > >> We have added the new owner broker to the close producer and
> > consumer
> > > > > >> command.
> > > > > >> We can also update the owner and then close the producer and
> > consumers
> > > > > >> with
> > > > > >> the new owner.
> > > > > >>
> > > > > >
> > > > > > This PIP is based on PIP-192 because the proposed protocol change
> > is
> > > > based
> > > > > > on the bundle states from PIP-192.
> > > > > > I assume the community could raise other PIPs to utilize the
> close
> > > > > > producer and consumer command changes for other logic too.
> > > > > >
> > > > > >
> > > > > >>
> > > > > >> Regards,
> > > > > >> Penghui
> > > > > >>
> > > > > >> On Thu, Aug 3, 2023 at 4:38 AM Heesung Sohn
> > > > > >> <heesung.s...@streamnative.io.invalid> wrote:
> > > > > >>
> > > > > >> > Hi PengHui,
> > > > > >> > Thank you for your questions. Please find my answers inline.
> > > > > >> >
> > > > > >> > On Wed, Aug 2, 2023 at 3:47 AM PengHui Li <peng...@apache.org
> >
> > > > wrote:
> > > > > >> >
> > > > > >> > > Hi Heesung,
> > > > > >> > >
> > > > > >> > > I'm sorry for not getting back to you sooner.
> > > > > >> > > The motivation and the solution look good to me.
> > > > > >> > >
> > > > > >> > > I just want to leave some comments to make the proposal
> clear
> > > > > >> > > for users and developers.
> > > > > >> > >
> > > > > >> > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > >
> >
> ----------------------------------------------------------------------------------------------------
> > > > > >> > >
> > > > > >> > > We should have a section to describe the values that will be
> > added
> > > > > >> > > by this proposal.
> > > > > >> > >
> > > > > >> > > - Reduced the publish latency spike and e2e latency spike
> > during
> > > > the
> > > > > >> > > reconnecting
> > > > > >> > > - Mitigated bottleneck(lookup) of a large number of topics
> in
> > a
> > > > > >> cluster
> > > > > >> > >
> > > > > >> > > Maybe more, I can only think about the above two values.
> > > > > >> > >
> > > > > >> > >
> > > > > >> > Good point. I will clarify this part in the PIP.
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > >
> >
> ----------------------------------------------------------------------------------------------------
> > > > > >> > >
> > > > > >> > > For the proto changes, the client side can also use the HTTP
> > > > lookup
> > > > > >> > > service.
> > > > > >> > > We should also mention it in this proposal. Instead of
> adding
> > > > four new
> > > > > >> > > fields to
> > > > > >> > > the CommandCloseProducer and CommandCloseConsumer, maybe
> > adding a
> > > > new
> > > > > >> > > message BrokerIdentifier, or others is better.
> > > > > >> > >
> > > > > >> >
> > > > > >> > I am worried about the backward compatibility if we add a new
> > proto
> > > > > >> message
> > > > > >> > -- how can the old versions of clients handle this new proto
> > > > message?
> > > > > >> >
> > > > > >> > One of the advantages of this proposal (add these optional
> > > > dstBrokerUrl
> > > > > >> > fields in the existing CommandCloseProducer and
> > > > CommandCloseConsumer) is
> > > > > >> > the backward compatibility -- older clients can still work in
> > the
> > > > older
> > > > > >> way
> > > > > >> > by going through the lookup requests.
> > > > > >> >
> > > > > >> > Ref: https://protobuf.dev/overview/#updating-defs
> > > > > >> > Updating Proto Definitions Without Updating Code
> > > > > >> > <https://protobuf.dev/overview/#updating-defs>
> > > > > >> >
> > > > > >> > It’s standard for software products to be backward compatible,
> > but
> > > > it is
> > > > > >> > less common for them to be forward compatible. As long as you
> > follow
> > > > > >> > some simple
> > > > > >> > practices <
> > https://protobuf.dev/programming-guides/proto3/#updating
> > > > >
> > > > > >> when
> > > > > >> > updating .proto definitions, old code will read new messages
> > without
> > > > > >> > issues, ignoring any newly added fields. ...
> > > > > >> >
> > > > > >> >
> > > > > >> > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > >
> >
> ----------------------------------------------------------------------------------------------------
> > > > > >> > >
> > > > > >> > > There are two points we need to clarify in the proposal.
> > > > > >> > >
> > > > > >> > > - If the client reconnects before the new owner takes
> > ownership.
> > > > If
> > > > > >> the
> > > > > >> > > producer closed
> > > > > >> > >   after the new broker took ownership, would it be a
> potential
> > > > > >> > consistency
> > > > > >> > > issue?
> > > > > >> >
> > > > > >> > - If the owner changed again before the client reconnection.
> > How did
> > > > > >> Pulsar
> > > > > >> > > handle this case?
> > > > > >> > >    As I understand, it's fine because the broker will check
> > the
> > > > > >> ownership
> > > > > >> > > again when handling
> > > > > >> > >    producer and consumer creation. If it can be explained in
> > the
> > > > > >> > proposal,
> > > > > >> > > it should be great
> > > > > >> > >    for developers to understand it deeply.
> > > > > >> > >
> > > > > >> >
> > > > > >> > After the client connects to the destination broker, the next
> > > > > >> command(e.g.
> > > > > >> > ProducerCommand) requires
> > > > > >> > the destination broker to check the ownership again against
> its
> > > > local
> > > > > >> table
> > > > > >> > view of the bundle state channel.
> > > > > >> >
> > > > > >> > Upon this local ownership check, there could be the following
> > > > scenarios:
> > > > > >> >
> > > > > >> > Happy case:
> > > > > >> > - If the ownership state is `owned ` and the current broker is
> > > > indeed
> > > > > >> the
> > > > > >> > owner, the command completes.
> > > > > >> > Unhappy cases:
> > > > > >> > - If the ownership state is `owned ` and the current broker is
> > not
> > > > the
> > > > > >> > owner, the command fails
> > > > > >> > (the broker returns an error to the client), and the client
> > tries to
> > > > > >> find
> > > > > >> > the true new owner by lookups.
> > > > > >> > The global bundle state channel is eventually consistent, and
> > the
> > > > > >> lookups
> > > > > >> > should eventually converge.
> > > > > >> > - if the ownership change is still in progress(`releasing`,
> > > > > >> `assigning`),
> > > > > >> > this check will be deferred
> > > > > >> > until the state becomes `owned` with a timeout.
> > > > > >> >
> > > > > >> >
> > > > > >> > > Regards,
> > > > > >> > > Penghui
> > > > > >> > >
> > > > > >> > > On Wed, Aug 2, 2023 at 1:18 AM Heesung Sohn
> > > > > >> > > <heesung.s...@streamnative.io.invalid> wrote:
> > > > > >> > >
> > > > > >> > > > Hi,
> > > > > >> > > >
> > > > > >> > > > It has been over three weeks since this discussion
> started,
> > and
> > > > I
> > > > > >> > replied
> > > > > >> > > > to the questions in the PIP.
> > > > > >> > > > I will send out a vote email tomorrow if looking good.
> > > > > >> > > >
> > > > > >> > > > Thanks,
> > > > > >> > > > Heesung
> > > > > >> > > >
> > > > > >> > > > On Tue, Jul 25, 2023 at 1:12 PM Heesung Sohn <
> > > > > >> > > heesung.s...@streamnative.io
> > > > > >> > > > >
> > > > > >> > > > wrote:
> > > > > >> > > >
> > > > > >> > > > > Hi,
> > > > > >> > > > >
> > > > > >> > > > > I replied to the comments from Kai and Ran in the PIP.
> > > > > >> > > > >
> > > > > >> > > > > Please review the PIP by any chance.
> > > > > >> > > > >
> > > > > >> > > > > Thank you,
> > > > > >> > > > > Heesung
> > > > > >> > > > >
> > > > > >> > > > > On Thu, Jul 6, 2023 at 4:32 PM Heesung Sohn <
> > > > > >> > > > heesung.s...@streamnative.io>
> > > > > >> > > > > wrote:
> > > > > >> > > > >
> > > > > >> > > > >> Hi dev,
> > > > > >> > > > >>
> > > > > >> > > > >> I proposed this PIP,
> > > > https://github.com/apache/pulsar/pull/20748
> > > > > >> ,
> > > > > >> > to
> > > > > >> > > > >> make unloaded clients directly and gracefully connect
> to
> > new
> > > > > >> owner
> > > > > >> > > > brokers
> > > > > >> > > > >> without lookups.
> > > > > >> > > > >>
> > > > > >> > > > >> Please take a look and let me know what you think.
> > > > > >> > > > >>
> > > > > >> > > > >> Regards,
> > > > > >> > > > >> Heesung
> > > > > >> > > > >>
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > >
> >
>

Reply via email to