Hi Jiuming,

Thank you for bringing this up. From a Pulsar admin perspective, the current 
retention policy implementation does not ensure that users can seek back to a 
position within a specific size limit or have to pay extra cost to achieve 
that. For example, to guarantee able to seek back to a position 10GB earlier, 
users need to set the `retention policy = backlog quota + 10GB`. However, the 
backlog quota is typically set quite large to allow for significant data 
accumulation. Therefore, users must bear the cost of a large backlog quota 
(e.g., 100GB) to ensure they can revert to a position 10GB earlier, even if 
there isn't backlog in subscription.

Regards,
Yike
________________________________
From: 太上玄元道君 <dao...@apache.org>
Sent: Thursday, April 11, 2024 18:20
To: dev@pulsar.apache.org <dev@pulsar.apache.org>
Subject: [Discuss] Pulsar retention policy

Hi, Pulsar community,

I'm opening this thread to discuss the retention policy for managed ledgers.

Currently, the retention policy is defined as a time/size-based policy to
retain messages in the ledger, but there is a difference between the
official documentation and the actual code implementation.

The official documentation states that the retention policy is to retain
the messages that were *acknowledged*. For example, if the retention size
is set to 10GB and there are 20GB of messages acknowledged, Pulsar will
retain 10GB and delete the rest.

However, the actual code implementation is different. It retains the
messages that were *written* to the ledger, including *backlog messages*
and *acknowledged messages*. For instance, if there are 10GB of messages in
the backlog and 10GB of messages were acknowledged:
1. If the retention size is set to 10GB, Pulsar will only retain the 10GB
of messages in the backlog, and the 10GB of messages that were acknowledged
will be deleted.
2. If the retention size is set to 20GB, Pulsar will retain the 10GB of
messages in the backlog and the 10GB of messages that were acknowledged.
3. If the retention size is set to 5GB, Pulsar will retain the 10GB of
messages in the backlog, but the 10GB of messages that were acknowledged
will be deleted.
4. If the retention size is set to 15GB, Pulsar will retain the 10GB of
messages in the backlog and the 5GB of messages that were acknowledged. The
rest of the acknowledged messages will be deleted.

>From Pulsar open source to the present, the code implementation has never
changed, but the meaning of the official documentation has gradually
shifted. So I'm just considering which one is better: the official
documentation or the code implementation? Does the change in the meaning of
the document align more with expectations? Does it indicate that users want
to retain the messages that were acknowledged?

For a long time, users have believed that the Retention Policy is for
retaining messages that were acknowledged. If we change the document to
match the code implementation, will it meet users' expectations?

What should we do? Change the document to match the code implementation or
change the code implementation to match the document?

Regards,
Tao Jiuming

Reply via email to