Thanks for your response, Penghui.

I support simplifying message loss prevention for DLQ topics. However,
it's not clear to me why we should only simplify it for DLQ topics.

As a Pulsar user, I encountered many of the challenges you mention
when producing to auto created topics. In my architecture, I had
consumers reading from an input topic, transforming the data, and then
producing to an arbitrary number of output topics. My business logic
required that I not lose any messages, which is essentially the same
expectation from DLQ users here. I ended up increasing the retention
policy to about 4 hours on the output topics to minimize the possibility
of losing data. I had to scale up my bookkeeper cluster because of the
extra retention. If I had been able to ensure my auto created topic
would not delete messages before I created my subscriptions, I would
have had no retention policy and a smaller bookie cluster.

> Yes, essentially, the DLQ is only a topic, no other specific behaviors.
> But the issue that the proposal wants to resolve is not to introduce a
> specific behavior for the DLQ topic or something

I'm not sure this statement aligns with the PIP. It seems to me that
the PIP proposes solving the message loss issues by adding a DLQ
specific feature to the pulsar client.

Earlier, I proposed expanding the CreateProducer command to be able to
create a subscription. This solution is not right: it tightly couples
producers and consumers, which we want to avoid.

I think we should consider adding a new policy for Pulsar topics: a
namespace (or topic) policy that makes it possible to retain messages
indefinitely when a topic has no subscriptions.

Our message retention feature is very valuable. However,
message retention doesn't solve the "slow to subscribe" consumer
problem. In the event of long network partitions, a consumer might not be
able to subscribe before messages are deleted. This feature
mitigates that risk and allows users to set message retention time
based on other needs, not based on calculations about how long it
could take to subscribe to a topic.

This feature solves the DLQ message loss issue because the DLQ
producer can produce to any namespace, which is important for clusters
that do not have topic level policies enabled.

Let me know what you think.

Thanks,
Michael

On Tue, Jan 4, 2022 at 10:33 PM PengHui Li <peng...@apache.org> wrote:
>
> Thanks for the great comments, Michael.
>
> Let me try to clarify some context about the issue that users encountered
> and the improvement that the proposal wants to Introduce.
>
> > Before we get further into the implementation, I'd like to discuss
> whether the current behavior is the expected behavior, as this is
> the key motivation for this feature.
>
> The DLQ can generate dynamically and users might have short
> data retention for a namespace by time or by size. But the messages
> in the DLQ usually compensate afterward, and we should allow users
> to keep the data in the DLQ only if they want to delete them manually.
>
> The DLQ is always for a subscriber, so a subscriber can use a init name
> to achieve the purpose of not being cleaned up from the DLQ.
>
> So the key point for this proposal is to keep data in the lazy created DLQ
> topic until users wants to delete them manually.
>
> > I think the DLQ's current behavior is the expected behavior because
> the DLQ is only a topic and topics lose messages unless they have a
> subscription or a retention policy.
>
> Yes, essentially, the DLQ is only a topic, no other specific behaviors.
> But the issue that the proposal wants to resolve is not to introduce a
> specific
> behavior for the DLQ topic or something. It is just from the perspective of
> the DLQ use case,
> Convenient for users to keep data in DLQ.
>
> Without this option, we are not easy to support setting a subscription or
> data retention
> policy for a lazy created DLQ topic.
>
> > I admit that it is not necessarily a nice default behavior to
> potentially lose messages, but this is the design for all topics.
> Based on the current design, an admin can create a retention policy
> for the topic or namespace. Then, consumers of the
> topic have the duration of the retention policy to discover the topic
> and create a subscription before messages are lost. Is there a reason
> this solution doesn't work for the DLQ topic?
>
> The difference here is when the subscriber subscribes to the topic.
> For a normal topic, the expected behavior is the subscriber able to read all
> messages of the topic. It can start consuming for the earliest or latest or
> any other
> valid positions. But for the DLQ, contains part of the original data for a
> subscription.
> Users always don't expect to miss some head messages in the DLQ. Otherwise,
> You will get 1,2,3 first, and 4,5 to DLQ and continue to receive 6,7, but
> 4,5 might
> removed by pulsar automatically by Pulsar.
>
> The current solution does not work well for DLQ topic is users not easy to
> set a different
> data retention policy or create a new subscription for a lazy created DLQ
> topic.
>
> > As an aside, I wonder if topic discoverability is part of the problem
> here. It would be extremely valuable to get notifications any
> time a topic is created. That would allow users to move away from
> polling for current topic names towards a more reactive design.
>
> The notification is a good idea, for this case, the notification will have
> some drawbacks:
>
>    1. The delayed notification might not allow us to achieve the purpose
>    2. The complexity will increase, auth for the notifications, users need
>    to handle the events
>
> But the notifications can help in lots of parts such as improving
> observability, etc.
>
> Regards,
> Penghui
>
> On Tue, Jan 4, 2022 at 2:41 PM Michael Marshall <mmarsh...@apache.org>
> wrote:
>
> > Before we get further into the implementation, I'd like to discuss
> > whether the current behavior is the expected behavior, as this is
> > the key motivation for this feature.
> >
> > I think the DLQ's current behavior is the expected behavior because
> > the DLQ is only a topic and topics lose messages unless they have a
> > subscription or a retention policy.
> >
> > I admit that it is not necessarily a nice default behavior to
> > potentially lose messages, but this is the design for all topics.
> > Based on the current design, an admin can create a retention policy
> > for the topic or namespace. Then, consumers of the
> > topic have the duration of the retention policy to discover the topic
> > and create a subscription before messages are lost. Is there a reason
> > this solution doesn't work for the DLQ topic?
> >
> > Perhaps the disconnect here is that users of the DLQ feature do not
> > view the DLQ as only a Pulsar topic. I look forward to your thoughts.
> >
> > As an aside, I wonder if topic discoverability is part of the problem
> > here. It would be extremely valuable to get notifications any
> > time a topic is created. That would allow users to move away from
> > polling for current topic names towards a more reactive design.
> >
> > Thanks,
> > Michael
> >
> >
> > On Tue, Dec 28, 2021 at 7:59 PM Zike Yang
> > <zky...@streamnative.io.invalid> wrote:
> > >
> > > > Oh, that's a very interesting point. I think it'd be easy to add that
> > > > as "internal" feature, though I'm a bit puzzled on how to add that to
> > > > the producer API
> > >
> > > I think we can add a field `String initialSubscriptionName` to the
> > > Producer Configuration. And add a new field `optional string
> > > initial_subscription_name` to the `CommnadProducer`.
> > > When the Broker handles the CommandProducer, if it checks that the
> > > initialSubscriptionName is not empty or null, it will use
> > > initialSubscriptionName to create a subscription on that topic. When
> > > creating the deadLetterProducer or retryLetterProducer, we can specify
> > > and create the initial subscription directly through the Producer.
> > > What do you think?
> > >
> > > On Thu, Dec 23, 2021 at 7:42 AM Matteo Merli <matteo.me...@gmail.com>
> > wrote:
> > > >
> > > > > What if we extended the `CommandProducer` command to add a
> > > > > `create_subscription` field? Then, any time a topic is auto
> > > > > created and this field is true, the broker would auto create a
> > > > > subscription. There are some details to work out, but I think this
> > > > > feature would fulfill the needs of this PIP and would also be broadly
> > > > > useful for many client applications that dynamically create topics.
> > > >
> > > > Oh, that's a very interesting point. I think it'd be easy to add that
> > > > as "internal" feature, though I'm a bit puzzled on how to add that to
> > > > the producer API
> > >
> > >
> > >
> > > --
> > > Zike Yang
> >

Reply via email to