My primary concern with making this the new default is that the cost
of calculating the size is in the synchronization on the managed
ledger, which interrupts writes and reads through that managed ledger
and potentially others in some cases.

If the primary motivation for this PR is to avoid confusing users (and
I agree that the 0 value confuses users), we should consider making
the JSON result easier to interpret. It'd be just as easy to only
include the subscription backlog size when a client requests it.

What do you think about changing the result?

Thanks,
Michael

On Thu, Jan 19, 2023 at 4:03 AM Haiting Jiang <jianghait...@gmail.com> wrote:
>
> +1
>
> Haiting
>
> On Tue, Jan 17, 2023 at 5:50 PM Yubiao Feng
> <yubiao.f...@streamnative.io.invalid> wrote:
> >
> > Hi Asaf
> >
> > > It's worth noting that the estimated backlog size of a subscription is
> > estimated since it doesn't consider any acknowledged messages between the
> > mark delete position and the last message. It simply assumes all messages
> > between the mark delete position and the last message have not been
> > acknowledged.
> >
> > Yes, it's not the exact value of the backlog. There are two reasons for the
> > loss of accuracy:
> > - Whether the Entry size is closer to the `averageSize`.
> > - The number of messages after the mark deleted position has been
> > acknowledged.
> >
> > Thanks
> > Yubiao Feng
> >
> > On Tue, Jan 17, 2023 at 3:31 PM Asaf Mesika <asaf.mes...@gmail.com> wrote:
> >
> > > Small question regarding this:
> > >
> > > The code for calculation is:
> > >
> > > long estimateBacklogFromPosition(PositionImpl pos) {
> > >     synchronized (this) {
> > >         long sizeBeforePosLedger =
> > > ledgers.headMap(pos.getLedgerId()).values()
> > >                 .stream().mapToLong(LedgerInfo::getSize).sum();
> > >         LedgerInfo ledgerInfo = ledgers.get(pos.getLedgerId());
> > >         long sizeAfter = getTotalSize() - sizeBeforePosLedger;
> > >         if (ledgerInfo == null) {
> > >             return sizeAfter;
> > >         } else if (pos.getLedgerId() == currentLedger.getId()) {
> > >             return sizeAfter - consumedLedgerSize(currentLedgerSize,
> > > currentLedgerEntries, pos.getEntryId());
> > >         } else {
> > >             return sizeAfter -
> > > consumedLedgerSize(ledgerInfo.getSize(), ledgerInfo.getEntries(),
> > > pos.getEntryId());
> > >         }
> > >     }
> > > }
> > >
> > > and
> > >
> > > private long consumedLedgerSize(long ledgerSize, long ledgerEntries,
> > > long consumedEntries) {
> > >     if (ledgerEntries <= 0) {
> > >         return 0;
> > >     }
> > >     if (ledgerEntries <= (consumedEntries + 1)) {
> > >         return ledgerSize;
> > >     } else {
> > >         long averageSize = ledgerSize / ledgerEntries;
> > >         return consumedEntries >= 0 ? (consumedEntries + 1) * averageSize
> > > : 0;
> > >     }
> > > }
> > >
> > >
> > >
> > > It's worth noting that the estimated backlog size of a subscription is
> > > estimated since it doesn't consider any acknowledged messages between the
> > > mark delete position and the last message. It simply assumes all messages
> > > between the mark delete position and the last message have not been
> > > acknowledged.
> > >
> > > Good idea - +1
> > >
> > > On Tue, Jan 17, 2023 at 4:12 AM PengHui Li <codelipeng...@gmail.com>
> > > wrote:
> > >
> > > > +1
> > > >
> > > > Penghui
> > > >
> > > > > On Jan 16, 2023, at 23:36, Yubiao Feng <yubiao.f...@streamnative.io
> > > .INVALID>
> > > > wrote:
> > > > >
> > > > > Hi community
> > > > >
> > > > > I am starting a DISCUSS for making the default value of the parameter
> > > > > "--get-subscription-backlog-size" of admin API "topics stats" true.
> > > > >
> > > > > In the PR https://github.com/apache/pulsar/pull/9302, the property
> > > > backlog
> > > > > size of each subscription returned in the response of the API topics
> > > > stats,
> > > > > by default this property is always equal to 0 in response, and this
> > > will
> > > > > confuse users. Since the calculation of backlog size is done in broker
> > > > > memory, there is no significant overhead(the process is described in
> > > the
> > > > > following section), so I think the correct values should be displayed
> > > by
> > > > > default.
> > > > >
> > > > > ### The following two APIs should be affected:
> > > > >
> > > > > In Pulsar admin API
> > > > > ```
> > > > > pulsar-admin topics stats persistent://test-tenant/ns1/tp1
> > > > > --get-subscription-backlog-size
> > > > > pulsar-admin topics stats persistent://test-tenant/ns1/tp1 -sbs
> > > > > ```
> > > > > the default value of parameter `--get-subscription-backlog-size` will
> > > be
> > > > > `true`
> > > > >
> > > > > In Pulsar Rest API
> > > > > ```
> > > > > curl GET "http://127.0.0.1:8080/test-tenant/ns1/tp1/stats
> > > > > "?subscriptionBacklogSize=true
> > > > > ```
> > > > > the default value of parameter `subscriptionBacklogSize ` will be
> > > `true`
> > > > >
> > > > >
> > > > > ### The following is the process of calculating backlog size:
> > > > > - Divide `PersistentTopc.ledgers` into two parts according to the
> > > > ledgerId
> > > > > of the mark delete position of the cursor. The second part is ledgers
> > > > > indicating the messages still need to be consumed, aka
> > > > backlogSizeInLedgers.
> > > > > - Find the LedgerInfo whose ledgerId is the same as the ledgerId of 
> > > > > the
> > > > > mark delete position of the cursor, and we can also divide the ledger
> > > > into
> > > > > two parts, the second part is entries indicating the messages still
> > > need
> > > > to
> > > > > be consumed, multiply the average size of each entry in metrics by the
> > > > > number of still need to be consumed entries we can get the backlog 
> > > > > size
> > > > in
> > > > > this ledger. aka backlogSizeInEntries.
> > > > > - `backlogSizeInLe dgers` + `backlogSizeInEntries`
> > > > >
> > > > > Thanks
> > > > > Yubiao Feng
> > > >
> > > >
> > >

Reply via email to