Regarding the namespace "terminated" concept, I just noticed that we
already have a "deleted" field in a namespace's policies [0]. There is
even a comment that says:

> // set the policies to deleted so that somebody else cannot acquire this 
> namespace

I am not familiar with this feature, but it seems like this policy
field could be checked before creating a topic in a namespace. That
would remove certain races described above.

As I think about it more, I no longer think "terminated" is the right
term for what I proposed above. Our goal is to briefly prevent any
topic creation to ensure we can delete all sub resources for a
namespace. On the other hand, a terminated topic isn't necessarily
short lived. If we want to apply the "terminated" term unequivocally
to both topics and namespaces, I think a terminated namespace would
need to be a namespace where all topics are in terminated state and no
additional topics could be created. That's not the feature we're
discussing here, though. Deleted seems like the right term to me,
especially since we're already using it to prevent a race condition
[0].

Thanks,
Michael

[0] 
https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/impl/NamespacesBase.java#L252-L262

On Thu, Feb 24, 2022 at 9:58 PM Michael Marshall <mmarsh...@apache.org> wrote:
>
> Hi Dave,
>
> > automatically delete tenants and namespaces for not containing topics
>
> I don't think that is what we are discussing. I agree that the initial
> email says just that, though, which is why I asked above:
>
> >> When there are no user-created topics under a namespace,
> >> Namespace should be deleted.
> > This is supposed to mean that the namespace should be able to be
> > deleted, correct?
>
> Perhaps Mattison can clarify, too.
>
> My current understanding of the context for the PIP is that a user
> call to delete a namespace without force can fail when a producer
> reconnects to a deleted topic. The goal is to remove the race
> condition to ensure namespace deletion can succeed.
>
> Thanks,
> Michael
>
> On Thu, Feb 24, 2022 at 5:21 PM Dave Fisher <w...@apache.org> wrote:
> >
> > Hi -
> >
> > I hope I’m understanding what’s being discussed.
> >
> > If we are going to automatically delete tenants and namespaces for not 
> > containing topics then we need to make both of these automatic actions 
> > configurable with a default to NOT do so. Otherwise we break existing use 
> > cases.
> >
> > Automatic deletion of namespaces should be configurable at both the cluster 
> > and tenant level.
> >
> > Regards,
> > Dave
> >
> > > On Feb 24, 2022, at 2:25 PM, Michael Marshall <mmarsh...@apache.org> 
> > > wrote:
> > >
> > >> The old producer/consumer should be closed after applying the changes 
> > >> from
> > >> this proposal.
> > >
> > > Penghui, are you suggesting that we implement the namespace/tenant
> > > terminated logic after completing this PIP?
> > >
> > > For the sake of discussion, if we implement the namespace terminated
> > > logic first, we could fulfill the underlying requirements for this PIP
> > > by returning a new non-retriable error response when a client tries to
> > > connect a producer or a consumer to a topic in a namespace that is
> > > "terminated". If we do add the "namespace terminated" feature, we'll
> > > need to add a non-retriable exception for this case, anyway. The main
> > > advantage here is that we'd only need one expansion of the protobuf
> > > instead of two. The downside is that the protocol for connected
> > > clients has a couple more roundtrips. The broker would disconnect
> > > connected clients and then fail their reconnection attempt with a
> > > non-retriable error.
> > >
> > > Thanks,
> > > Michael
> > >
> > > On Thu, Feb 24, 2022 at 7:11 AM PengHui Li <peng...@apache.org> wrote:
> > >>
> > >>> If we want to solve this problem, we need to add some sync resources 
> > >>> like
> > >> lock/state, I think it’s a harm for us, we don’t need to do that.
> > >>
> > >> I think we can make the namespace/tenants to the inactive state first so
> > >> that we can avoid any new
> > >> producer/consumer connect to the topic under the namespace/tenant.
> > >>
> > >> The old producer/consumer should be closed after applying the changes 
> > >> from
> > >> this proposal.
> > >>
> > >> Thanks,
> > >> Penghui
> > >>
> > >> On Tue, Feb 8, 2022 at 5:47 PM mattison chao <mattisonc...@gmail.com> 
> > >> wrote:
> > >>
> > >>>> This is supposed to mean that the namespace should be able to be
> > >>>> deleted, correct?
> > >>>
> > >>> Yes, the main background is the user doesn’t have an active topic. so,
> > >>> they want to delete the namespace.
> > >>>
> > >>>> However, I think
> > >>>> we might still have a race condition that could make tenant or
> > >>>> namespace deletion fail. Specifically, if a new producer or consumer
> > >>>> creates a topic after the namespace deletion has started but
> > >>>> before it is complete. Do you agree that the underlying race still
> > >>> exists?
> > >>>
> > >>> Yes, this condition exists. I think it’s not a big problem because the
> > >>> user doesn’t want to use this namespace anymore.
> > >>> If this scenario appears, they will get an error and need to delete it
> > >>> again.
> > >>>
> > >>>> What if we expand our usage of the "terminated" feature to apply to
> > >>>> namespaces (and tenants)? Then, a terminated namespace can have
> > >>>> bundles and topics can be deleted but not created (just as a terminated
> > >>>> topic cannot have any new messages published to it). This would take
> > >>>> care of all topic creation race conditions. We'd probably need to add
> > >>>> new protobuf exceptions for this feature.
> > >>>
> > >>>
> > >>> If we want to solve this problem, we need to add some sync resources 
> > >>> like
> > >>> lock/state, I think it’s a harm for us, we don’t need to do that.
> > >>>
> > >>> Thanks for your suggestions, let me know what you think.
> > >>>
> > >>> Best,
> > >>> Mattison
> > >>>
> > >>>> On Feb 1, 2022, at 2:26 PM, Michael Marshall <mmarsh...@apache.org>
> > >>> wrote:
> > >>>>
> > >>>> This proposal identifies an important issue that we should definitely
> > >>>> solve. I have some questions.
> > >>>>
> > >>>>> When there are no user-created topics under a namespace,
> > >>>>> Namespace should be deleted.
> > >>>>
> > >>>> This is supposed to mean that the namespace should be able to be
> > >>>> deleted, correct?
> > >>>>
> > >>>>> For this reason, we need to close the system topic reader/producer
> > >>>>> first, then remove the system topic. finally, remove the namespace.
> > >>>>
> > >>>> I agree that expanding the protobuf CloseProducer and CloseConsumer
> > >>>> commands could be valuable here. The expansion would ensure that
> > >>>> producers and consumers don't attempt to reconnect. However, I think
> > >>>> we might still have a race condition that could make tenant or
> > >>>> namespace deletion fail. Specifically, if a new producer or consumer
> > >>>> creates a topic after the namespace deletion has started but
> > >>>> before it is complete. Do you agree that the underlying race still
> > >>> exists?
> > >>>>
> > >>>> In my view, the fundamental problem here is that deleting certain 
> > >>>> Pulsar
> > >>>> resources takes time and, in a distributed system, that means race
> > >>>> conditions.
> > >>>>
> > >>>> What if we expand our usage of the "terminated" feature to apply to
> > >>>> namespaces (and tenants)? Then, a terminated namespace can have
> > >>>> bundles and topics can be deleted but not created (just as a terminated
> > >>>> topic cannot have any new messages published to it). This would take
> > >>>> care of all topic creation race conditions. We'd probably need to add
> > >>>> new protobuf exceptions for this feature.
> > >>>>
> > >>>> Thanks,
> > >>>> Michael
> > >>>>
> > >>>>
> > >>>> On Sat, Jan 29, 2022 at 7:25 PM Zike Yang
> > >>>> <zky...@streamnative.io.invalid> wrote:
> > >>>>>
> > >>>>> +1
> > >>>>>
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Zike
> > >>>>>
> > >>>>> On Sat, Jan 29, 2022 at 12:30 PM guo jiwei <techno...@apache.org>
> > >>> wrote:
> > >>>>>>
> > >>>>>> Hi
> > >>>>>> The PIP link : https://github.com/apache/pulsar/issues/13989
> > >>>>>>
> > >>>>>> Regards
> > >>>>>> Jiwei Guo (Tboy)
> > >>>>>>
> > >>>>>>
> > >>>>>> On Sat, Jan 29, 2022 at 11:46 AM mattison chao 
> > >>>>>> <mattisonc...@gmail.com
> > >>>>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Hello everyone,
> > >>>>>>>
> > >>>>>>> I want to start a discussion about PIP-139 : Support Broker send
> > >>> command
> > >>>>>>> to real close producer/consumer.
> > >>>>>>>
> > >>>>>>> This is the PIP document
> > >>>>>>>
> > >>>>>>> https://github.com/apache/pulsar/issues/13989 <
> > >>>>>>> https://github.com/apache/pulsar/issues/13979>
> > >>>>>>>
> > >>>>>>> Please check it out and feel free to share your thoughts.
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>> Mattison
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> ———————— Pasted below for quoting convenience.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Relation pull request:  #13337
> > >>>>>>> Authors: @Technoboy-  @mattisonchao
> > >>>>>>>
> > >>>>>>> ## Motivation
> > >>>>>>>
> > >>>>>>> Before we discuss this pip, I'd like to supplement some context to
> > >>> help
> > >>>>>>> contributors who don't want to read the original pull request.
> > >>>>>>>
> > >>>>>>>> When there are no user-created topics under a namespace, Namespace
> > >>>>>>> should be deleted. But currently, the system topic existed and the
> > >>>>>>> reader/producer could auto-create the system which may cause the
> > >>> namespace
> > >>>>>>> deletion to fail.
> > >>>>>>>
> > >>>>>>> For this reason, we need to close the system topic reader/producer
> > >>> first,
> > >>>>>>> then remove the system topic. finally, remove the namespace.
> > >>>>>>>
> > >>>>>>> Following this way, we first want to use ``terminate`` to solve this
> > >>>>>>> problem. then we found producers can disconnect, but consumers are
> > >>> still
> > >>>>>>> alive. So, another PR #13960 wants to add consumers' closing logic.
> > >>>>>>>
> > >>>>>>> After #13960, all things look good, but another problem appears. 
> > >>>>>>> that
> > >>> is
> > >>>>>>> we need to wait until consumers completely consume all messages 
> > >>>>>>> (this
> > >>> may
> > >>>>>>> make terminate topic so long and the operation timeout)then get
> > >>>>>>> ``reachedEndOfTopic``. the relative code here :
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>> https://github.com/apache/pulsar/blob/07ef9231db8b844586b9217ee2d59237eb9c54b7/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/CompactorSubscription.java#L102-L106
> > >>>>>>>
> > >>>>>>> In the #13337 case, we need to force close consumers immediately. So
> > >>> we
> > >>>>>>> write this PIP to discuss another way to resolve this problem.
> > >>>>>>>
> > >>>>>>> ## Goal
> > >>>>>>>
> > >>>>>>> We can add a new field(`allow_reconnect`) in command
> > >>>>>>> ``CommandCloseProducer``/ ``CommandCloseConsumer`` to close
> > >>>>>>> producer/consumers immediately.
> > >>>>>>>
> > >>>>>>> ## API Changes
> > >>>>>>>
> > >>>>>>> - Add ``allow_reconnect`` to ``CommandCloseProducer``;
> > >>>>>>>
> > >>>>>>> ```java
> > >>>>>>> **Before**
> > >>>>>>>
> > >>>>>>> message CommandCloseProducer {
> > >>>>>>>   required uint64 producer_id = 1;
> > >>>>>>>   required uint64 request_id = 2;
> > >>>>>>> }
> > >>>>>>>
> > >>>>>>> **After**
> > >>>>>>> message CommandCloseProducer {
> > >>>>>>>   required uint64 producer_id = 1;
> > >>>>>>>   required uint64 request_id = 2;
> > >>>>>>>   optional bool allow_reconnect = 3 [default = true];
> > >>>>>>> }
> > >>>>>>> ```
> > >>>>>>>
> > >>>>>>> - Add ``allow_reconnect`` to ``CommandCloseConsumer``
> > >>>>>>> ```java
> > >>>>>>> **Before**
> > >>>>>>>
> > >>>>>>> message CommandCloseConsumer {
> > >>>>>>>   required uint64 consumer_id = 1;
> > >>>>>>>   required uint64 request_id = 2;
> > >>>>>>> }
> > >>>>>>>
> > >>>>>>> **After**
> > >>>>>>> message CommandCloseConsumer {
> > >>>>>>>   required uint64 consumer_id = 1;
> > >>>>>>>   required uint64 request_id = 2;
> > >>>>>>>   optional bool allow_reconnect = 3 [default = true];
> > >>>>>>> }
> > >>>>>>> ```
> > >>>>>>>
> > >>>>>>> ## Implementation
> > >>>>>>>
> > >>>>>>> ### ClientCnx - Producer:
> > >>>>>>>
> > >>>>>>> **Before**
> > >>>>>>> ```java
> > >>>>>>>   @Override
> > >>>>>>>   protected void handleCloseProducer(CommandCloseProducer
> > >>> closeProducer)
> > >>>>>>> {
> > >>>>>>>       log.info("[{}] Broker notification of Closed producer: {}",
> > >>>>>>> remoteAddress, closeProducer.getProducerId());
> > >>>>>>>       final long producerId = closeProducer.getProducerId();
> > >>>>>>>       ProducerImpl<?> producer = producers.get(producerId);
> > >>>>>>>       if (producer != null) {
> > >>>>>>>           producer.connectionClosed(this);
> > >>>>>>>       } else {
> > >>>>>>>           log.warn("Producer with id {} not found while closing
> > >>> producer
> > >>>>>>> ", producerId);
> > >>>>>>>       }
> > >>>>>>>   }
> > >>>>>>> ```
> > >>>>>>> After:
> > >>>>>>> ```java
> > >>>>>>>  @Override
> > >>>>>>>   protected void handleCloseProducer(CommandCloseProducer
> > >>> closeProducer)
> > >>>>>>> {
> > >>>>>>>       log.info("[{}] Broker notification of Closed producer: {}",
> > >>>>>>> remoteAddress, closeProducer.getProducerId());
> > >>>>>>>       final long producerId = closeProducer.getProducerId();
> > >>>>>>>       ProducerImpl<?> producer = producers.get(producerId);
> > >>>>>>>       if (producer != null) {
> > >>>>>>>           if (closeProducer.isAllowReconnect) {
> > >>>>>>>               producer.connectionClosed(this);
> > >>>>>>>           } else {
> > >>>>>>>               producer.closeAsync();
> > >>>>>>>           }
> > >>>>>>>       } else {
> > >>>>>>>           log.warn("Producer with id {} not found while closing
> > >>> producer
> > >>>>>>> ", producerId);
> > >>>>>>>       }
> > >>>>>>>   }
> > >>>>>>> ```
> > >>>>>>> ### ClientCnx - Consumer:
> > >>>>>>>
> > >>>>>>> **Before**
> > >>>>>>> ```java
> > >>>>>>>   @Override
> > >>>>>>>   protected void handleCloseConsumer(CommandCloseConsumer
> > >>> closeConsumer)
> > >>>>>>> {
> > >>>>>>>       log.info("[{}] Broker notification of Closed consumer: {}",
> > >>>>>>> remoteAddress, closeConsumer.getConsumerId());
> > >>>>>>>       final long consumerId = closeConsumer.getConsumerId();
> > >>>>>>>       ConsumerImpl<?> consumer = consumers.get(consumerId);
> > >>>>>>>       if (consumer != null) {
> > >>>>>>>           consumer.connectionClosed(this);
> > >>>>>>>       } else {
> > >>>>>>>           log.warn("Consumer with id {} not found while closing
> > >>> consumer
> > >>>>>>> ", consumerId);
> > >>>>>>>       }
> > >>>>>>>   }
> > >>>>>>> ```
> > >>>>>>> **After**
> > >>>>>>> ```java
> > >>>>>>>   @Override
> > >>>>>>>   protected void handleCloseConsumer(CommandCloseConsumer
> > >>> closeConsumer)
> > >>>>>>> {
> > >>>>>>>       log.info("[{}] Broker notification of Closed consumer: {}",
> > >>>>>>> remoteAddress, closeConsumer.getConsumerId());
> > >>>>>>>       final long consumerId = closeConsumer.getConsumerId();
> > >>>>>>>       ConsumerImpl<?> consumer = consumers.get(consumerId);
> > >>>>>>>       if (consumer != null) {
> > >>>>>>>           if (closeConsumer.isAllowReconnect) {
> > >>>>>>>               consumer.connectionClosed(this);
> > >>>>>>>           } else {
> > >>>>>>>               consumer.closeAsync();
> > >>>>>>>           }
> > >>>>>>>       } else {
> > >>>>>>>           log.warn("Consumer with id {} not found while closing
> > >>> consumer
> > >>>>>>> ", consumerId);
> > >>>>>>>       }
> > >>>>>>>   }
> > >>>>>>> ```
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> ## Reject Alternatives
> > >>>>>>>
> > >>>>>>> none.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Zike Yang
> > >>>
> > >>>
> >

Reply via email to