Hi Asaf

> It's worth noting that the estimated backlog size of a subscription is
estimated since it doesn't consider any acknowledged messages between the
mark delete position and the last message. It simply assumes all messages
between the mark delete position and the last message have not been
acknowledged.

Yes, it's not the exact value of the backlog. There are two reasons for the
loss of accuracy:
- Whether the Entry size is closer to the `averageSize`.
- The number of messages after the mark deleted position has been
acknowledged.

Thanks
Yubiao Feng

On Tue, Jan 17, 2023 at 3:31 PM Asaf Mesika <asaf.mes...@gmail.com> wrote:

> Small question regarding this:
>
> The code for calculation is:
>
> long estimateBacklogFromPosition(PositionImpl pos) {
>     synchronized (this) {
>         long sizeBeforePosLedger =
> ledgers.headMap(pos.getLedgerId()).values()
>                 .stream().mapToLong(LedgerInfo::getSize).sum();
>         LedgerInfo ledgerInfo = ledgers.get(pos.getLedgerId());
>         long sizeAfter = getTotalSize() - sizeBeforePosLedger;
>         if (ledgerInfo == null) {
>             return sizeAfter;
>         } else if (pos.getLedgerId() == currentLedger.getId()) {
>             return sizeAfter - consumedLedgerSize(currentLedgerSize,
> currentLedgerEntries, pos.getEntryId());
>         } else {
>             return sizeAfter -
> consumedLedgerSize(ledgerInfo.getSize(), ledgerInfo.getEntries(),
> pos.getEntryId());
>         }
>     }
> }
>
> and
>
> private long consumedLedgerSize(long ledgerSize, long ledgerEntries,
> long consumedEntries) {
>     if (ledgerEntries <= 0) {
>         return 0;
>     }
>     if (ledgerEntries <= (consumedEntries + 1)) {
>         return ledgerSize;
>     } else {
>         long averageSize = ledgerSize / ledgerEntries;
>         return consumedEntries >= 0 ? (consumedEntries + 1) * averageSize
> : 0;
>     }
> }
>
>
>
> It's worth noting that the estimated backlog size of a subscription is
> estimated since it doesn't consider any acknowledged messages between the
> mark delete position and the last message. It simply assumes all messages
> between the mark delete position and the last message have not been
> acknowledged.
>
> Good idea - +1
>
> On Tue, Jan 17, 2023 at 4:12 AM PengHui Li <codelipeng...@gmail.com>
> wrote:
>
> > +1
> >
> > Penghui
> >
> > > On Jan 16, 2023, at 23:36, Yubiao Feng <yubiao.f...@streamnative.io
> .INVALID>
> > wrote:
> > >
> > > Hi community
> > >
> > > I am starting a DISCUSS for making the default value of the parameter
> > > "--get-subscription-backlog-size" of admin API "topics stats" true.
> > >
> > > In the PR https://github.com/apache/pulsar/pull/9302, the property
> > backlog
> > > size of each subscription returned in the response of the API topics
> > stats,
> > > by default this property is always equal to 0 in response, and this
> will
> > > confuse users. Since the calculation of backlog size is done in broker
> > > memory, there is no significant overhead(the process is described in
> the
> > > following section), so I think the correct values should be displayed
> by
> > > default.
> > >
> > > ### The following two APIs should be affected:
> > >
> > > In Pulsar admin API
> > > ```
> > > pulsar-admin topics stats persistent://test-tenant/ns1/tp1
> > > --get-subscription-backlog-size
> > > pulsar-admin topics stats persistent://test-tenant/ns1/tp1 -sbs
> > > ```
> > > the default value of parameter `--get-subscription-backlog-size` will
> be
> > > `true`
> > >
> > > In Pulsar Rest API
> > > ```
> > > curl GET "http://127.0.0.1:8080/test-tenant/ns1/tp1/stats
> > > "?subscriptionBacklogSize=true
> > > ```
> > > the default value of parameter `subscriptionBacklogSize ` will be
> `true`
> > >
> > >
> > > ### The following is the process of calculating backlog size:
> > > - Divide `PersistentTopc.ledgers` into two parts according to the
> > ledgerId
> > > of the mark delete position of the cursor. The second part is ledgers
> > > indicating the messages still need to be consumed, aka
> > backlogSizeInLedgers.
> > > - Find the LedgerInfo whose ledgerId is the same as the ledgerId of the
> > > mark delete position of the cursor, and we can also divide the ledger
> > into
> > > two parts, the second part is entries indicating the messages still
> need
> > to
> > > be consumed, multiply the average size of each entry in metrics by the
> > > number of still need to be consumed entries we can get the backlog size
> > in
> > > this ledger. aka backlogSizeInEntries.
> > > - `backlogSizeInLe dgers` + `backlogSizeInEntries`
> > >
> > > Thanks
> > > Yubiao Feng
> >
> >
>

Reply via email to