I see, it only affects the TTL. We can acknowledge the messages to
clean up the data. Then we have to document the `brokerInterceptors`
config clearly to say TTL might not work if
`AppendBrokerTimestampMetadataInterceptor` is not configured.

Thanks,
Yunze

On Thu, Jan 18, 2024 at 10:41 AM PengHui Li <peng...@apache.org> wrote:
>
> > I don't think we should provide such an option for users. Compared
> with the serious security issue, the extra overhead (13 bytes) per
> entry (not per message) should never be a concern.
>
> It actually changed the stored data format. As far as I know, there are
> some users
> who read data from the offloaded data files directly. So, at least, we
> should provide
> a way to allow them to avoid the stored data format change.
>
> And 13 bytes are also related to the use case. If users don't use TTL and
> send
> too many small messages without batch. The cost will be increased by 10% or
> 20%.
> The issue will not damage the data or cause the data loss.
> You can monitor it, to skip the backlogs. TTL is not the only way to evict
> backlogs.
>
> Thanks,
> Penghui
>
> On Thu, Jan 18, 2024 at 10:33 AM PengHui Li <peng...@apache.org> wrote:
>
> > > We should also improve the TTL:
> > > When client publish time > Ledger create time + Ledger rollover time,
> > > and `brokerPublishTime` is not set,
> > > we can make Ledger TTL time =  Ledger create time + Ledger rollover
> > time.
> >
> > > This change can make sure entry expires when client clock is not right.
> >
> > This solution looks good to me.
> >
> > Regards,
> > Penghui
> >
> >
> > On Wed, Jan 17, 2024 at 10:31 PM Yunze Xu <x...@apache.org> wrote:
> >
> >> From my perspective, it's a serious security issue. The client side
> >> could easily make Pulsar's message expiry mechanism not work. Even if
> >> it's not a malicious change made by users, it would be hard to figure
> >> out what makes the message not expired.
> >>
> >> > Users can still have a way to disable it if they care about the
> >> additional metadata stored in each entry.
> >>
> >> I don't think we should provide such an option for users. Compared
> >> with the serious security issue, the extra overhead (13 bytes) per
> >> entry (not per message) should never be a concern.
> >>
> >> My suggestion is that the broker entry metadata with the broker side
> >> timestamp should always be generated and we should not allow users to
> >> disable it. However, `hasBrokerPublishTime` should not be deprecated.
> >> The broker entry metadata won't be transferred to the client unless
> >> `exposingBrokerEntryMetadataToClientEnabled` is true.
> >>
> >> Thanks,
> >> Yunze
> >>
> >> On Wed, Jan 17, 2024 at 6:45 PM Lin Lin <lin...@apache.org> wrote:
> >> >
> >> > Only enabled `AppendBrokerTimestampMetadataInterceptor` is not enough.
> >> >
> >> > We should also improve the TTL:
> >> > When client publish time > Ledger create time + Ledger rollover time,
> >> > and `brokerPublishTime` is not set,
> >> > we can make Ledger TTL time =  Ledger create time + Ledger rollover
> >> time.
> >> >
> >> > This change can make sure entry expired when client clock is not right.
> >> >
> >> > On 2024/01/17 04:31:16 PengHui Li wrote:
> >> > > > User disabling `AppendBrokerTimestampMetadataInterceptor` does not
> >> mean
> >> > > that they allow this bug.
> >> > > This is a configuration, not an API. It is difficult to use
> >> documentation
> >> > > to regulate user behavior.
> >> > >
> >> > > Actually, they need to know why they want to disable `
> >> > > AppendBrokerTimestampMetadataInterceptor`.
> >> > > Otherwise, why do they want to disable `AppendBrokerTimestampMetadata
> >> > > Interceptor`?
> >> > >
> >> > > Pulsar's default configuration tries to provide a best practice for
> >> most of
> >> > > the cases. To avoid the potential
> >> > > problem, Pulsar enables  `AppendBrokerTimestampMetadataInterceptor` by
> >> > > default. But it doesn't mean all the cases
> >> > > should enable `AppendBrokerTimestampMetadataInterceptor`. If users
> >> don't
> >> > > use TTL, the producers are
> >> > > well-maintained. Is there any reason to have at least 16 bytes append
> >> to
> >> > > each entry? The default configuration
> >> > > behavior change also needs to be highlighted in the release note.
> >> Users
> >> > > need to know if they don't disable the
> >> > > `AppendBrokerTimestampMetadataInterceptor` manually on a newer
> >> version, the
> >> > > retained data size will increase.
> >> > >
> >> > > BTW, we need to explain each interceptor in the `broker.conf` and why
> >> `
> >> > > AppendBrokerTimestampMetadataInterceptor`
> >> > > is enabled by default. If users don't want to read it, change
> >> anything in
> >> > > `broker.conf` they don't really know what it is.
> >> > > How can they know what they expect?
> >> > >
> >> > > Regards,
> >> > > Penghui
> >> > >
> >> > >
> >> > > On Mon, Jan 15, 2024 at 11:09 PM Lin Lin <lin...@apache.org> wrote:
> >> > >
> >> > > > User disabling `AppendBrokerTimestampMetadataInterceptor` does not
> >> mean
> >> > > > that they allow this bug.
> >> > > > This is a configuration, not an API. It is difficult to use
> >> documentation
> >> > > > to regulate user behavior.
> >> > > >
> >> > > > Maybe we can add a new field (BrokerTimestamp) to save the
> >> timestamp on
> >> > > > the Broker side.
> >> > > > The time priority for trimming Ledger is as follows:
> >> > > >
> >> > > > BrokerPublishTime > BrokerTimestamp
> >> > > >
> >> > > > If `BrokerPublishTime` exists, `BrokerReceiveTime` is not set.
> >> > > > If not, we set `BrokerReceiveTime`  and is no longer affected by
> >> client
> >> > > > time.
> >> > > >
> >> > > > On 2024/01/15 02:15:17 PengHui Li wrote:
> >> > > > > IMO, we should enable `AppendBrokerTimestampMetadataInterceptor`
> >> by
> >> > > > default.
> >> > > > > Users can still have a way to disable it if they care about the
> >> > > > additional
> >> > > > > metadata stored
> >> > > > > in each entry.
> >> > > > >
> >> > > > > For the `hasBrokerPublishTime` API. The topic might also have
> >> historical
> >> > > > > data without
> >> > > > > broker publish time. So, it should be fine to keep this API
> >> because we
> >> > > > > don't know how
> >> > > > > long users will retain their data.
> >> > > > >
> >> > > > > Regards,
> >> > > > > Penghui
> >> > > > >
> >> > > > > On Sat, Jan 6, 2024 at 10:35 PM linlin <lin...@apache.org> wrote:
> >> > > > >
> >> > > > > > Now, if the message's metadata does not set a broker side
> >> timestamp,
> >> > > > the
> >> > > > > > ledger expiration check is based on the client's publish time.
> >> > > > > >
> >> > > > > > When the client machine's clock is incorrect (eg: set to 1 year
> >> later)
> >> > > > ,
> >> > > > > > the ledger can not be cleaned up. Issue
> >> > > > > > https://github.com/apache/pulsar/issues/21347
> >> > > > > >
> >> > > > > > `AppendBrokerTimestampMetadataInterceptor` can set timestamp for
> >> > > > messages
> >> > > > > > on the broker side, but we can not ensure that the
> >> > > > > > `AppendBrokerTimestampMetadataInterceptor` is always enable
> >> > > > > >
> >> > > > > > Therefore, I open this PR(
> >> https://github.com/apache/pulsar/pull/21835)
> >> > > > to
> >> > > > > > always set the broker timestamp for messages on the broker side.
> >> > > > > >
> >> > > > > > With this change , firstly we should deprecate
> >> > > > > > AppendBrokerTimestampMetadataInterceptor.
> >> > > > > > It no longer needs to exist
> >> > > > > >
> >> > > > > > Secondly, we should deprecate `hasBrokerPublishTime` in
> >> interface
> >> > > > Message.
> >> > > > > > It always returns true.
> >> > > > > > This API is created in PR (
> >> https://github.com/apache/pulsar/pull/11553
> >> > > > )
> >> > > > > > This PR is for the client to obtain BrokerPublishTime, so the
> >> > > > > > `hasBrokerPublishTime` API is not necessary.
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >>
> >

Reply via email to