Re: [DISCUSS] PIP-345: Optimize finding message by timestamp

太上玄元道君 Mon, 11 Mar 2024 20:55:28 -0700

Hi Dave,
Thanks for your review!

Perhaps it's because I wrote more detailed steps, but the key points is:
1. Deserialize MessageMetadata once broker received message
2. Pass MessageMetadata to `PublishContext`
3. After add entries finished, get `publishTimestamp` from
`PublishContext#messageMetadata`, and
update `beginPublishTimestamp`,`endPublishTimestamp` of the `Ledger`


Since we might deserialize MessageMetadata in message publishing in the
broker side(PersistentTopic#isExceedMaximumDeliveryDelay,
MessageDeduplication), deserialize MessageMetadata once broker received
message will help
to reduce the number of MessageMetadata deserializing in some cases.
About maintain these new ledger fields, it just like
```
public class ManagedLedgerImpl {
    // New field
    // Add a map to record the begin/end publish timestamp of the ledger
    private final NavigableMap<Long, MutablePair</* begin publish
timestamp*/Long, /* end publish timestamp*/Long>> publishTimestamps
            = new ConcurrentSkipListMap<>();

    // Update the begin/end publish timestamp of the ledger after the entry
is added to the ledger.
    // New method
    protected void updatePublishTimestamp(long ledgerId, long
publishTimestamp) {
        MutablePair<Long, Long> pair =
publishTimestamps.computeIfAbsent(ledgerId, k -> new
MutablePair<>(Long.MAX_VALUE, Long.MIN_VALUE));
        pair.setLeft(Math.min(pair.getLeft(), publishTimestamp));
        pair.setRight(Math.max(pair.getRight(), publishTimestamp));
    }
}
```
I just use a Map to maintain it, when closing Ledger, set
`beginPublishTimestamp`,`endPublishTimestamp` to `LedgerInfo`.
Besides, no additional expenses were introduced.

So, if you are asking about `the time spent`, I would say, *nearly* zero.

Thanks,
Tao Jiuming

Dave Fisher <w...@apache.org> 于2024年3月12日周二 10:50写道：

> What can you say about the time spent to maintain these new ledger fields?
> I think you are asking to modify the main message logic which is highly
> optimized., but I’m not sure. Have you tried your code on your own
> hardware? Do you have performance comparisons of the normal flow?
>
> > On Mar 11, 2024, at 7:41 PM, 太上玄元道君 <dao...@apache.org> wrote:
> >
> > bump
> >
> > 太上玄元道君 <dao...@apache.org>于2024年3月11日 周一17:55写道：
> >
> >> bump
> >>
> >> 太上玄元道君 <dao...@apache.org> 于2024年3月10日周日 06:41写道：
> >>
> >>> Hi Pulsar community,
> >>>
> >>> A new PIP is opened, this thread is to discuss PIP-345: Optimize
> finding
> >>> message by timestamp.
> >>>
> >>> Motivation:
> >>> Finding message by timestamp is widely used in Pulsar:
> >>> * It is used by the `pulsar-admin` tool to get the message id by
> >>> timestamp, expire messages by timestamp, and reset cursor.
> >>> * It is used by the `pulsar-client` to reset the subscription to a
> >>> specific timestamp.
> >>> * And also used by the `expiry-monitor` to find the messages that are
> >>> expired.
> >>> Even though the current implementation is correct, and using binary
> >>> search to speed-up, but it's still not efficient *enough*.
> >>> The current implementation is to scan all the ledgers to find the
> message
> >>> by timestamp.
> >>> This is a performance bottleneck, especially for large topics with many
> >>> messages.
> >>> Say, if there is a topic which has 1m entries, through the binary
> search,
> >>> it will take 20 iterations to find the message.
> >>> In some extreme cases, it may lead to a timeout, and the client will
> not
> >>> be able to seeking by timestamp.
> >>>
> >>> PIP: https://github.com/apache/pulsar/pull/22234
> >>>
> >>> Your feedback is very important to us, please take a moment to review
> the
> >>> proposal and provide your thoughts.
> >>>
> >>> Thanks,
> >>> Tao Jiuming
> >>>
> >>
>
>

Re: [DISCUSS] PIP-345: Optimize finding message by timestamp

Reply via email to