Re: [DISCUSS] PIP-345: Optimize finding message by timestamp

Girish Sharma Thu, 14 Mar 2024 21:14:15 -0700

One suggestion, I think you can make do with storing just begin timestamp.
Any search utilising these values will work the same way with just one of
those timestamps compared to both begin and end.


Any particular reason you need both the timestamps?

Regards

On Fri, Mar 15, 2024, 9:39 AM 太上玄元道君 <dao...@apache.org> wrote:

> bump
>
> 太上玄元道君 <dao...@apache.org>于2024年3月10日 周日06:41写道：
>
> > Hi Pulsar community,
> >
> > A new PIP is opened, this thread is to discuss PIP-345: Optimize finding
> > message by timestamp.
> >
> > Motivation:
> > Finding message by timestamp is widely used in Pulsar:
> > * It is used by the `pulsar-admin` tool to get the message id by
> > timestamp, expire messages by timestamp, and reset cursor.
> > * It is used by the `pulsar-client` to reset the subscription to a
> > specific timestamp.
> > * And also used by the `expiry-monitor` to find the messages that are
> > expired.
> > Even though the current implementation is correct, and using binary
> search
> > to speed-up, but it's still not efficient *enough*.
> > The current implementation is to scan all the ledgers to find the message
> > by timestamp.
> > This is a performance bottleneck, especially for large topics with many
> > messages.
> > Say, if there is a topic which has 1m entries, through the binary search,
> > it will take 20 iterations to find the message.
> > In some extreme cases, it may lead to a timeout, and the client will not
> > be able to seeking by timestamp.
> >
> > PIP: https://github.com/apache/pulsar/pull/22234
> >
> > Your feedback is very important to us, please take a moment to review the
> > proposal and provide your thoughts.
> >
> > Thanks,
> > Tao Jiuming
> >
>

Re: [DISCUSS] PIP-345: Optimize finding message by timestamp

Reply via email to