Re: [VOTE] PIP-363: Add callback parameters to the method: org.apache.pulsar.client.impl.SendCallback.sendComplete.

2024-07-08 Thread
bump

Jie crossover  于2024年6月24日周一 10:01写道:

> Hi, Pulsar Community.
>
> I would like to start the voting thread for PIP-363: Add callback
> parameters to the method:
> org.apache.pulsar.client.impl.SendCallback.sendComplete.
>
> PIP: https://github.com/apache/pulsar/pull/22940
> Discussion thread:
> https://lists.apache.org/thread/8pgmsvx1bxz4z1w8prpvpnfpt1kb57c9
> --
> Best Regards!
> crossoverJie
>


Re: [VOTE] PIP-363: Add callback parameters to the method: org.apache.pulsar.client.impl.SendCallback.sendComplete.

2024-06-23 Thread
+1 nonbinding

Jie crossover 于2024年6月24日 周一10:01写道:

> Hi, Pulsar Community.
>
> I would like to start the voting thread for PIP-363: Add callback
> parameters to the method:
> org.apache.pulsar.client.impl.SendCallback.sendComplete.
>
> PIP: https://github.com/apache/pulsar/pull/22940
> Discussion thread:
> https://lists.apache.org/thread/8pgmsvx1bxz4z1w8prpvpnfpt1kb57c9
> --
> Best Regards!
> crossoverJie
>


Re: [DISCUSS] PIP-363: Add callback parameters to the method: org.apache.pulsar.client.impl.SendCallback.sendComplete.

2024-06-21 Thread
Hi Jie,

Thanks, so we don't need to fix Skywalking, for OpenTelemetry-javaagent,
I'll create a PR to fix this.

Jie crossover  于2024年6月21日周五 17:52写道:

> Hi Jiuming,
>
> Good suggestion, done.
> --
> Best Regards!
> crossoverJie
>
>
> 太上玄元道君  于2024年6月21日周五 17:34写道:
>
> > Hi Jie,
> >
> > I suggest you don't modify the parameters order.
> >
> > Change
> > ```
> > void sendComplete(OpSendMsgStats stats, Exception e);
> > ```
> > to
> > ```
> > void sendComplete(Exception e, OpSendMsgStats stats);
> > ```
> >
> > Thanks,
> > Tao Jiuming
> >
> > Jie crossover  于2024年6月21日周五 14:05写道:
> >
> > > Hi Jiuming,
> > > You are right, because the data sent in batches cannot be obtained in
> the
> > > current pulsar-client, so when this feature is released, we need to add
> > an
> > > agent based on the latest version in OTel and Skywalking.
> > >
> > > --
> > > Best Regards!
> > > crossoverJie
> > >
> > >
> > > 太上玄元道君  于2024年6月21日周五 13:30写道:
> > >
> > > > Hi Jie,
> > > >
> > > > I support the PIP but there are something we need to consider:
> > > > OpenTelemetry-javagent[1] and Apache Skywalking-javaagent[2] enhanced
> > > > `SendCallback` to monitor sending messages, if we change the method
> > sign,
> > > > the java-agents can't work.
> > > > It will be great that you can make it compatible, or, we have to push
> > PRs
> > > > to OpenTelemetry/Skywalking to fix it.
> > > >
> > > > Links:
> > > > [1]
> > > >
> > > >
> > >
> >
> https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/pulsar/pulsar-2.8/javaagent/src/main/java/io/opentelemetry/javaagent/instrumentation/pulsar/v2_8/ProducerImplInstrumentation.java
> > > > [2]
> > > >
> > > >
> > >
> >
> https://github.com/dao-jun/skywalking-java/blob/main/apm-sniffer/apm-sdk-plugin/pulsar-common/src/main/java/org/apache/skywalking/apm/plugin/pulsar/common/SendCallbackInterceptor.java
> > > >
> > > > Thanks,
> > > > Tao Jiuming
> > > >
> > > > Zixuan Liu  于2024年6月21日周五 12:25写道:
> > > >
> > > > > +1
> > > > >
> > > > > Jie crossover  于2024年6月19日周三 16:12写道:
> > > > >
> > > > > > Hi, Pulsar Community.
> > > > > >
> > > > > > I have created a PIP  to add an `OpSendMsgStats` parameter to the
> > > > > > `SendCallback` interface.
> > > > > > This will help us implement `messaging.publish.messages` metric.
> > > > > > PIP link: https://github.com/apache/pulsar/pull/22940
> > > > > > OpenTelemetry Messaging Metrics link:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://opentelemetry.io/docs/specs/semconv/messaging/messaging-metrics/#metric-messagingpublishduration
> > > > > > --
> > > > > > Best Regards!
> > > > > > crossoverJie
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] PIP-363: Add callback parameters to the method: org.apache.pulsar.client.impl.SendCallback.sendComplete.

2024-06-21 Thread
Hi Jie,

I suggest you don't modify the parameters order.

Change
```
void sendComplete(OpSendMsgStats stats, Exception e);
```
to
```
void sendComplete(Exception e, OpSendMsgStats stats);
```

Thanks,
Tao Jiuming

Jie crossover  于2024年6月21日周五 14:05写道:

> Hi Jiuming,
> You are right, because the data sent in batches cannot be obtained in the
> current pulsar-client, so when this feature is released, we need to add an
> agent based on the latest version in OTel and Skywalking.
>
> --
> Best Regards!
> crossoverJie
>
>
> 太上玄元道君  于2024年6月21日周五 13:30写道:
>
> > Hi Jie,
> >
> > I support the PIP but there are something we need to consider:
> > OpenTelemetry-javagent[1] and Apache Skywalking-javaagent[2] enhanced
> > `SendCallback` to monitor sending messages, if we change the method sign,
> > the java-agents can't work.
> > It will be great that you can make it compatible, or, we have to push PRs
> > to OpenTelemetry/Skywalking to fix it.
> >
> > Links:
> > [1]
> >
> >
> https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/pulsar/pulsar-2.8/javaagent/src/main/java/io/opentelemetry/javaagent/instrumentation/pulsar/v2_8/ProducerImplInstrumentation.java
> > [2]
> >
> >
> https://github.com/dao-jun/skywalking-java/blob/main/apm-sniffer/apm-sdk-plugin/pulsar-common/src/main/java/org/apache/skywalking/apm/plugin/pulsar/common/SendCallbackInterceptor.java
> >
> > Thanks,
> > Tao Jiuming
> >
> > Zixuan Liu  于2024年6月21日周五 12:25写道:
> >
> > > +1
> > >
> > > Jie crossover  于2024年6月19日周三 16:12写道:
> > >
> > > > Hi, Pulsar Community.
> > > >
> > > > I have created a PIP  to add an `OpSendMsgStats` parameter to the
> > > > `SendCallback` interface.
> > > > This will help us implement `messaging.publish.messages` metric.
> > > > PIP link: https://github.com/apache/pulsar/pull/22940
> > > > OpenTelemetry Messaging Metrics link:
> > > >
> > > >
> > >
> >
> https://opentelemetry.io/docs/specs/semconv/messaging/messaging-metrics/#metric-messagingpublishduration
> > > > --
> > > > Best Regards!
> > > > crossoverJie
> > > >
> > >
> >
>


Re: [DISCUSS] PIP-363: Add callback parameters to the method: org.apache.pulsar.client.impl.SendCallback.sendComplete.

2024-06-20 Thread
Hi Jie,

I support the PIP but there are something we need to consider:
OpenTelemetry-javagent[1] and Apache Skywalking-javaagent[2] enhanced
`SendCallback` to monitor sending messages, if we change the method sign,
the java-agents can't work.
It will be great that you can make it compatible, or, we have to push PRs
to OpenTelemetry/Skywalking to fix it.

Links:
[1]
https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/pulsar/pulsar-2.8/javaagent/src/main/java/io/opentelemetry/javaagent/instrumentation/pulsar/v2_8/ProducerImplInstrumentation.java
[2]
https://github.com/dao-jun/skywalking-java/blob/main/apm-sniffer/apm-sdk-plugin/pulsar-common/src/main/java/org/apache/skywalking/apm/plugin/pulsar/common/SendCallbackInterceptor.java

Thanks,
Tao Jiuming

Zixuan Liu  于2024年6月21日周五 12:25写道:

> +1
>
> Jie crossover  于2024年6月19日周三 16:12写道:
>
> > Hi, Pulsar Community.
> >
> > I have created a PIP  to add an `OpSendMsgStats` parameter to the
> > `SendCallback` interface.
> > This will help us implement `messaging.publish.messages` metric.
> > PIP link: https://github.com/apache/pulsar/pull/22940
> > OpenTelemetry Messaging Metrics link:
> >
> >
> https://opentelemetry.io/docs/specs/semconv/messaging/messaging-metrics/#metric-messagingpublishduration
> > --
> > Best Regards!
> > crossoverJie
> >
>


Re: [VOTE] PIP-355: Enhancing Broker-Level Metrics for Pulsar

2024-05-31 Thread
+1 nonbinding

Thanks,
Tao Jiuming

Hang Chen 于2024年5月31日 周五08:47写道:

> Hi, all
>
> I would like to start the voting thread for PIP-355: Enhancing
> Broker-Level Metrics for Pulsar
>
> PIP: https://github.com/apache/pulsar/pull/22778
>
> Thanks
> Hang
>


Re: [DISCUSS] PIP-353: Improve transaction message visibility for peek-messages

2024-05-21 Thread
+1

Baodi Shi 于2024年5月20日 周一18:48写道:

> Hi all, I push a pip to improve transaction message visibility for the
> peek-messages command.
>
> https://github.com/apache/pulsar/pull/22746
>
> Please feel free to leave your ideas.
>
>
> Thanks,
> Baodi Shi
>


Re: [VOTE] PIP-347: add role field in consumer's stat

2024-05-14 Thread
+1 nonbinding

Thanks,
Tao Jiuming

thetumbled 于2024年5月14日 周二17:26写道:

> Hi, Pulsar Community.
>   I would like to start the voting thread for PIP-347: add role field in
> consumer's stat.
>   Proposal PR: https://github.com/apache/pulsar/pull/22564
>   Implementation PR: https://github.com/apache/pulsar/pull/22562
>
> Thanks,
> Wenzhi Feng(thetumbled).


Re: [VOTE] PIP-350: Allow to disable the managedLedgerOffloadDeletionLagInMillis

2024-05-12 Thread
+1 nonbinding

Thanks,
Tao Jiuming

Yong Zhang  于2024年5月13日周一 10:57写道:

> Hi,
>
> I would like to start voting thread for PIP-350.
> https://github.com/apache/pulsar/pull/22688
>
> The implementation PR is:
> https://github.com/apache/pulsar/pull/22689
>
> Discuss thread:
> https://lists.apache.org/thread/7tlpkcm2933ddg95kgrb42943r4gq3v9
>
> Thanks,
> Yong
>


Re: [DISCUSS] PIP-350: Allow to disable the managedLedgerOffloadDeletionLagInMillis

2024-05-09 Thread
looks good

Yong Zhang  于2024年5月10日周五 11:31写道:

> Hi all,
>
> I pushed a new proposal to allow to use -1 to disable
> the managedLedgerOffloadDeletionLagInMillis to not delete the data from
> bookkeeper.
>
> Please take a look and share your thoughts. Thanks!
>
> PIP: https://github.com/apache/pulsar/pull/22688
>
> Best Regards,
> Yong
>


Re: [VOTE] PIP-349: Add systemCursorNames ignore list for TTL check

2024-05-07 Thread
+1 nonbinding

Thanks,
Tao Jiuming

Hang Chen 于2024年5月7日 周二11:25写道:

> Hi guys,
>  I want to start voting for PIP-349.
>
> PIP: https://github.com/apache/pulsar/pull/22651
> PR: https://github.com/apache/pulsar/pull/22614
>
> DISCUSSION Thread:
> https://lists.apache.org/thread/xgcworz4j8rjlqwr476s7sqn9do43f1t
>
> Thanks,
> Hang
>


Re: [VOTE] PIP-348: Trigger offload on topic load stage

2024-05-07 Thread
+1 nonbinding

Thanks,
Tao Jiuming

Hang Chen 于2024年5月7日 周二11:27写道:

> Hi guys,
>  I want to start voting for PIP-348.
>
> PIP: https://github.com/apache/pulsar/pull/22650
> PR: https://github.com/apache/pulsar/pull/22652
>
> DISCUSSION Thread:
> https://lists.apache.org/thread/2ndomp8v4wkcykzthhlyjqfmswor88kv
>
> Thanks,
> Hang
>


Re: [DISCUSS] PIP-348: Trigger offload on topic load stage

2024-05-05 Thread
hi Enrico,

I still think it doesn't matter

1. Offload a topic or not, configured by
`managedLedgerOffloadAutoTriggerSizeThresholdBytes`/`managedLedgerOffloadThresholdInSeconds`,
even though we may trigger a bunch of stuff, it's expected behavior

2. After triggered to offload a topic, the offload task will be added to an
Executor, currently, we have 2 threads to execute offload with sync mode,
so, even though there are thousands of topics, only 2 tasks can be executed
at the same time, other tasks wait.

3. If you are worrying about system overwhelmed, there is a PR to introduce
offload throttling(https://github.com/apache/pulsar/pull/22385),
it is for the purpose of protect the broker, it has been pending in review
for a lng time, I'm very appreciate it that if you can help review

4. There are t many flags in pulsar(~500),  I think we should be
cautious when introducing new FLAGs

Thanks,
Tao Jiuming

Enrico Olivelli 于2024年5月5日 周日13:44写道:

> daojun,
>
> Il giorno sab 4 mag 2024 alle ore 22:06 太上玄元道君  ha
> scritto:
>
> > it doesn't matter, we don't need to wait until offload finished when
> > loading topic, since offloading may take a long time.
> > it just trigger offloading, and we don't need to handle the result.
> >
>
> We don't have to wait for the results, but we still trigger a bunch of
> stuff and if the broker has thousand of topics, there may be an herd effect
> as this is a behavior change, I think it is safest to have a flag to
> disable this new behavior,
> otherwise upgrading an existing big cluster may lead to unexpected results,
> like overwhelmed systems
>
>
> Enrico
>
>
>
> >
> > Enrico Olivelli 于2024年5月5日 周日02:41写道:
> >
> > > Overall I think it is a good idea.
> > >
> > > What about adding a flag to enable this behavior?
> > >
> > > In case there are problems with the offloader the topic won't be able
> to
> > > load.
> > >
> > > Enrico
> > >
> > > Il Sab 4 Mag 2024, 17:12 Hang Chen  ha scritto:
> > >
> > > > Hi guys,
> > > >  I pushed a new proposal to improve the tiered storage offload
> > > > efficiency. Please help take a look, thanks.
> > > >
> > > > PIP: https://github.com/apache/pulsar/pull/22650
> > > >
> > > > Thanks,
> > > > Hang
> > > >
> > >
> >
>


Re: [DISCUSS] PIP-349: Add systemCursorNames ignore list for TTL check

2024-05-04 Thread
looks good

Hang Chen 于2024年5月5日 周日00:02写道:

> Hi guys,
> I proposed a new proposal to improve the TTL impact on system
> cursors. Please help take a look, thanks.
>
> PIP: https://github.com/apache/pulsar/pull/22651
>
> Thanks,
> Hang
>


Re: [DISCUSS] PIP-348: Trigger offload on topic load stage

2024-05-04 Thread
it doesn't matter, we don't need to wait until offload finished when
loading topic, since offloading may take a long time.
it just trigger offloading, and we don't need to handle the result.

Enrico Olivelli 于2024年5月5日 周日02:41写道:

> Overall I think it is a good idea.
>
> What about adding a flag to enable this behavior?
>
> In case there are problems with the offloader the topic won't be able to
> load.
>
> Enrico
>
> Il Sab 4 Mag 2024, 17:12 Hang Chen  ha scritto:
>
> > Hi guys,
> >  I pushed a new proposal to improve the tiered storage offload
> > efficiency. Please help take a look, thanks.
> >
> > PIP: https://github.com/apache/pulsar/pull/22650
> >
> > Thanks,
> > Hang
> >
>


Re: [DISCUSS] PIP-347: add role field in consumer's stat

2024-04-25 Thread
It looks like https://github.com/apache/pulsar/pull/22562 is the
implementation PR and just introduces a `role` field in consumer stats.

It is acceptable and I agree with it.

Thanks,
Tao Jiuming

thetumbled  于2024年4月23日周二 16:23写道:

> Hi, pulsar community.
>   I open a new PIP for finding out the owner of subscriptions.
> link:https://github.com/apache/pulsar/pull/22564.
> Thanks.
> thetumbled.


Re: [VOTE] PIP-327 Support force topic loading for unrecoverable errors

2024-04-25 Thread
+1 nonbinding


Rajan Dhabalia  于2024年4月17日周三 01:08写道:

> Hi,
>
> I would like to start voting thread for PIP-327  which also addresses issue
> # https://github.com/apache/pulsar/issues/21751
>
> PIP design PR:
> https://github.com/apache/pulsar/pull/21752
> 
>
> Thread:
> https://lists.apache.org/thread/w7w91xztdyy07otw0dh71nl2rn3yy45p
>
> Thanks,
> Rajan
>


Re: [Discuss] Pulsar retention policy

2024-04-24 Thread
bump

太上玄元道君  于2024年4月11日周四 18:20写道:

> Hi, Pulsar community,
>
> I'm opening this thread to discuss the retention policy for managed
> ledgers.
>
> Currently, the retention policy is defined as a time/size-based policy to
> retain messages in the ledger, but there is a difference between the
> official documentation and the actual code implementation.
>
> The official documentation states that the retention policy is to retain
> the messages that were *acknowledged*. For example, if the retention size
> is set to 10GB and there are 20GB of messages acknowledged, Pulsar will
> retain 10GB and delete the rest.
>
> However, the actual code implementation is different. It retains the
> messages that were *written* to the ledger, including *backlog messages*
> and *acknowledged messages*. For instance, if there are 10GB of messages in
> the backlog and 10GB of messages were acknowledged:
> 1. If the retention size is set to 10GB, Pulsar will only retain the 10GB
> of messages in the backlog, and the 10GB of messages that were acknowledged
> will be deleted.
> 2. If the retention size is set to 20GB, Pulsar will retain the 10GB of
> messages in the backlog and the 10GB of messages that were acknowledged.
> 3. If the retention size is set to 5GB, Pulsar will retain the 10GB of
> messages in the backlog, but the 10GB of messages that were acknowledged
> will be deleted.
> 4. If the retention size is set to 15GB, Pulsar will retain the 10GB of
> messages in the backlog and the 5GB of messages that were acknowledged. The
> rest of the acknowledged messages will be deleted.
>
> From Pulsar open source to the present, the code implementation has never
> changed, but the meaning of the official documentation has gradually
> shifted. So I'm just considering which one is better: the official
> documentation or the code implementation? Does the change in the meaning of
> the document align more with expectations? Does it indicate that users want
> to retain the messages that were acknowledged?
>
> For a long time, users have believed that the Retention Policy is for
> retaining messages that were acknowledged. If we change the document to
> match the code implementation, will it meet users' expectations?
>
> What should we do? Change the document to match the code implementation or
> change the code implementation to match the document?
>
> Regards,
> Tao Jiuming
>


Re: [Discuss] Pulsar retention policy

2024-04-15 Thread
Hi Yike,

The current code implementation of the retention policy looks a little
strange to me.

The biggest problem is we have coupled the backlog quota and retention
policy together,
we cannot retain historical data without setting the backlog quota, say, if
I want to retain 10GB of acknowledged messages,
then I have to set a backlog quota.

The backlog quota will block message publishing or acknowledge messages
automatically, in some cases it's unacceptable.

Personally, I prefer the description of the retention policy in the
official document, it's independent.

Thanks,
Tao Jiuming

Yike Xiao  于2024年4月13日周六 23:32写道:

> Hi Jiuming,
>
> Thank you for bringing this up. From a Pulsar admin perspective, the
> current retention policy implementation does not ensure that users can seek
> back to a position within a specific size limit or have to pay extra cost
> to achieve that. For example, to guarantee able to seek back to a position
> 10GB earlier, users need to set the `retention policy = backlog quota +
> 10GB`. However, the backlog quota is typically set quite large to allow for
> significant data accumulation. Therefore, users must bear the cost of a
> large backlog quota (e.g., 100GB) to ensure they can revert to a position
> 10GB earlier, even if there isn't backlog in subscription.
>
> Regards,
> Yike
> ____
> From: 太上玄元道君 
> Sent: Thursday, April 11, 2024 18:20
> To: dev@pulsar.apache.org 
> Subject: [Discuss] Pulsar retention policy
>
> Hi, Pulsar community,
>
> I'm opening this thread to discuss the retention policy for managed
> ledgers.
>
> Currently, the retention policy is defined as a time/size-based policy to
> retain messages in the ledger, but there is a difference between the
> official documentation and the actual code implementation.
>
> The official documentation states that the retention policy is to retain
> the messages that were *acknowledged*. For example, if the retention size
> is set to 10GB and there are 20GB of messages acknowledged, Pulsar will
> retain 10GB and delete the rest.
>
> However, the actual code implementation is different. It retains the
> messages that were *written* to the ledger, including *backlog messages*
> and *acknowledged messages*. For instance, if there are 10GB of messages in
> the backlog and 10GB of messages were acknowledged:
> 1. If the retention size is set to 10GB, Pulsar will only retain the 10GB
> of messages in the backlog, and the 10GB of messages that were acknowledged
> will be deleted.
> 2. If the retention size is set to 20GB, Pulsar will retain the 10GB of
> messages in the backlog and the 10GB of messages that were acknowledged.
> 3. If the retention size is set to 5GB, Pulsar will retain the 10GB of
> messages in the backlog, but the 10GB of messages that were acknowledged
> will be deleted.
> 4. If the retention size is set to 15GB, Pulsar will retain the 10GB of
> messages in the backlog and the 5GB of messages that were acknowledged. The
> rest of the acknowledged messages will be deleted.
>
> From Pulsar open source to the present, the code implementation has never
> changed, but the meaning of the official documentation has gradually
> shifted. So I'm just considering which one is better: the official
> documentation or the code implementation? Does the change in the meaning of
> the document align more with expectations? Does it indicate that users want
> to retain the messages that were acknowledged?
>
> For a long time, users have believed that the Retention Policy is for
> retaining messages that were acknowledged. If we change the document to
> match the code implementation, will it meet users' expectations?
>
> What should we do? Change the document to match the code implementation or
> change the code implementation to match the document?
>
> Regards,
> Tao Jiuming
>


[Discuss] Pulsar retention policy

2024-04-11 Thread
Hi, Pulsar community,

I'm opening this thread to discuss the retention policy for managed ledgers.

Currently, the retention policy is defined as a time/size-based policy to
retain messages in the ledger, but there is a difference between the
official documentation and the actual code implementation.

The official documentation states that the retention policy is to retain
the messages that were *acknowledged*. For example, if the retention size
is set to 10GB and there are 20GB of messages acknowledged, Pulsar will
retain 10GB and delete the rest.

However, the actual code implementation is different. It retains the
messages that were *written* to the ledger, including *backlog messages*
and *acknowledged messages*. For instance, if there are 10GB of messages in
the backlog and 10GB of messages were acknowledged:
1. If the retention size is set to 10GB, Pulsar will only retain the 10GB
of messages in the backlog, and the 10GB of messages that were acknowledged
will be deleted.
2. If the retention size is set to 20GB, Pulsar will retain the 10GB of
messages in the backlog and the 10GB of messages that were acknowledged.
3. If the retention size is set to 5GB, Pulsar will retain the 10GB of
messages in the backlog, but the 10GB of messages that were acknowledged
will be deleted.
4. If the retention size is set to 15GB, Pulsar will retain the 10GB of
messages in the backlog and the 5GB of messages that were acknowledged. The
rest of the acknowledged messages will be deleted.

>From Pulsar open source to the present, the code implementation has never
changed, but the meaning of the official documentation has gradually
shifted. So I'm just considering which one is better: the official
documentation or the code implementation? Does the change in the meaning of
the document align more with expectations? Does it indicate that users want
to retain the messages that were acknowledged?

For a long time, users have believed that the Retention Policy is for
retaining messages that were acknowledged. If we change the document to
match the code implementation, will it meet users' expectations?

What should we do? Change the document to match the code implementation or
change the code implementation to match the document?

Regards,
Tao Jiuming


Re: [DISCUSS] Remove the limitation between backlog quota and retention policy

2024-04-10 Thread
the limitation seems make no sense, after
http://github.com/apache/pulsar/pull/22472 merged, I'll try to fix it.

Thanks,
Tao Jiuming

Yike Xiao 于2024年4月10日 周三16:58写道:

> Here's a hint as to why this restriction was introduced.
> After reviewing the current implementation of the retention policy, I
> found that the logic for determining whether a ledger exceeds the retention
> size and should be deleted is based on the following logic (pseudocode):
>
> ```
> long totalSizeToDelete = 0;
> for (ledger in ledgersBeforeSlowestReaderLedgerId) {
>   totalSizeToDelete += ledger.size;
>   if (managedLedgerTotalSize - totalSizeToDelete >= retentionSize) {
> // add this ledger to ledgersToDelete list
>   }
> }
> ```
>
> The actual code can be seen at: [1]
>
> The issue lies in the use of `managedLedgerTotalSize - totalSizeToDelete
> >= retentionSize`, where `managedLedgerTotalSize` is used instead of
> `SUM(ledgerSize(ledgersBeforeSlowestReaderLedgerId))`. This causes ledgers
> that have not been acknowledged to also be included in the retention size
> comparison.
>
> Here is an example to illustrate the problem:
> Suppose the current ManagedLedger has 5 ledgers, [1, 2, 3, 4, 5], each
> ledger's size is 1MB, and the retentionSize is set to 2MB. The current
> slowestReaderLedgerId is 3. Based on the current implementation, ledgers 1
> and 2 are expected to be retained but will be deleted.
> The retention policy will only actually take effect when the retentionSize
> exceeds the backlog quota.
>
> [1]:
> https://github.com/apache/pulsar/blob/master/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L2695-L2696
>
> Regards,
> Yike
> 
> From: Yike Xiao 
> Sent: Wednesday, April 3, 2024 15:36
> To: dev@pulsar.apache.org 
> Subject: [DISCUSS] Remove the limitation between backlog quota and
> retention policy
>
> Hi all,
>
> In Pulsar, backlogs consist of unacknowledged messages. Once a message is
> acknowledged by all subscriptions, it enters the retention stage, as
> clearly illustrated in the official documentation [1]. The backlog quota
> limits the size and/or time threshold of the backlog, while the retention
> policy dictates the duration or volume of messages (acknowledged by all
> subscriptions) to retain.
>
> However, I encountered an undocumented restriction that mandates the
> backlog quota to be greater than the retention policy. For example, a
> namespace with a backlog quota 20GB in size and unlimited in time cannot
> have a 2GB in size and 7 days in time retention policy. This detail is
> implemented in the code [3].
>
> This restriction poses challenges in scenarios where there's a need for a
> larger backlog quota to accommodate extensive usage while desiring a
> shorter retention period for traceback purposes. Such configurations seem
> unattainable under the current constraint.
>
> Given that this limitation has been part of Pulsar since its initial
> open-source release, I am curious about the rationale behind it.
> Understanding the original intent or the technical considerations that led
> to this restriction would be immensely helpful.
>
> Therefore, I propose reevaluating and potentially removing this
> restriction to allow for more flexible configurations of backlog quota and
> retention policy.
>
> I would greatly appreciate your thoughts on this matter.
>
> Thank you for your time and consideration. I look forward to your response
> and am eager to contribute further to this discussion.
>
> [1]:
> https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpulsar.apache.org%2Fdocs%2Fnext%2Fcookbooks-retention-expiry%2F%23retention-policies=05%7C02%7C%7Ccb059c912d5a4f98327308dc53b0ded0%7C84df9e7fe9f640afb435%7C1%7C0%7C638477266343295533%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=wRjCYXkwGq4TAhYet8FdxkpnaehpCay7ST1AZi2jKAo%3D=0
> <
> https://pulsar.apache.org/docs/next/cookbooks-retention-expiry/#retention-policies
> >
> [3]:
> https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fpulsar%2Fblob%2F7315aeb6258b7adc9d874268d50acb95ffc0cf2b%2Fpulsar-broker%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fpulsar%2Fbroker%2Fadmin%2FAdminResource.java%23L374-L393=05%7C02%7C%7Ccb059c912d5a4f98327308dc53b0ded0%7C84df9e7fe9f640afb435%7C1%7C0%7C638477266343313621%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=JmKZUCH%2F9dD07pGIknGpOnTfGUNmaDAn3bM7f%2FirgKg%3D=0
> <
> https://github.com/apache/pulsar/blob/7315aeb6258b7adc9d874268d50acb95ffc0cf2b/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/AdminResource.java#L374-L393
> >
>
> Regards,
> Yike
>


Re: [VOTE] PIP-345: Optimize finding message by timestamp

2024-03-25 Thread
Hi Penghui,

Thanks for your feedback!

I'm not sure about this either, since publishTimestamp is a Messaging layer
concept, and ML as a Persistence layer should not be aware about this.

But in ML, I'd noticed some methods searching message by
PublishTimestamp(say,
ManagedLedgerImpl#getEarliestMessagePublishTimeInBacklog),
 so that's why I want to add publishTimestamp to ML.

Introduce secondary index to ML is a good idea, since RocketMQ has a `Hash
index`, and Kakfa has a `Sparse index`.

For finding message by timestamp, we can introduce `sparse index` to
Pulsar, after add entries complete, add a index to `ManagedLedgerIndex` and
store the index to ML. What do you think?

Thanks,
Tao Jiuming



PengHui Li  于2024年3月25日周一 15:17写道:

> Hi, Jiuming
>
> I'm sorry for not getting back to you sooner.
>
> First, I support the motivation to optimize this case because it could be a
> significant
> blocker for users who want infinite data retention, which is a BIG
> differentiator
> with Apache Kafka. And, I really saw the cases with high publish
> throughput, and one
> ledger could even hold 1M entries, 100M new entries published to a topic.
>
> Then, I try to check the details of the existing implementation. I think
> the tricky part is
> the publish time is not the concept of the ManageLedger. I saw the changes
> that you
> proposed will add publish time to the ManageLedger module, which doesn't
> look good
> me. Because it will couple the Pulsar concept with the ManageLedger
> concept.
>
> Essentially, the publish time could be a secondary index of the
> ManageLedger.
> My opinion is to have a general ManagedLedgerIndex abstract, and the Pulsar
> broker
> can create any index it wants. Since the broker creates the index, the
> broker can control the
> index's behavior. Then, the ManageLedger can provide an API to search the
> entry
> with a ManagedLedgerIndex. With this option, we don't need to add the
> publish
> time concept to ManagedLedger directly.
>
> In this case, if the broker tries to search the entry with a predicate and
> index. The managed
> ledger will search from the index first. Of course, if the relevant entry
> cannot be found in the index,
> just fall back to the "optimized full scan".
>
> Regards,
> Penghui
>
>
> On Mon, Mar 25, 2024 at 11:51 AM 太上玄元道君  wrote:
>
> > bump
> >
> > 太上玄元道君 于2024年3月20日 周三16:23写道:
> >
> > > bump
> > >
> > > 太上玄元道君 于2024年3月19日 周二19:35写道:
> > >
> > >> Hi Pulsar community,
> > >>
> > >> This thread is to start a vote for PIP-345: Optimize finding message
> by
> > >> timestamp
> > >>
> > >> PIP: https://github.com/apache/pulsar/pull/22234
> > >> Discuss thread:
> > >> https://lists.apache.org/thread/5owc9os6wmy52zxbv07qo2jrfjm17hd2
> > >>
> > >> Thanks,
> > >> Tao Jiuming
> > >>
> > >
> >
>


Re: [VOTE] PIP-345: Optimize finding message by timestamp

2024-03-24 Thread
bump

太上玄元道君 于2024年3月20日 周三16:23写道:

> bump
>
> 太上玄元道君 于2024年3月19日 周二19:35写道:
>
>> Hi Pulsar community,
>>
>> This thread is to start a vote for PIP-345: Optimize finding message by
>> timestamp
>>
>> PIP: https://github.com/apache/pulsar/pull/22234
>> Discuss thread:
>> https://lists.apache.org/thread/5owc9os6wmy52zxbv07qo2jrfjm17hd2
>>
>> Thanks,
>> Tao Jiuming
>>
>


Re: [VOTE] PIP-345: Optimize finding message by timestamp

2024-03-20 Thread
bump

太上玄元道君 于2024年3月19日 周二19:35写道:

> Hi Pulsar community,
>
> This thread is to start a vote for PIP-345: Optimize finding message by
> timestamp
>
> PIP: https://github.com/apache/pulsar/pull/22234
> Discuss thread:
> https://lists.apache.org/thread/5owc9os6wmy52zxbv07qo2jrfjm17hd2
>
> Thanks,
> Tao Jiuming
>


[VOTE] PIP-345: Optimize finding message by timestamp

2024-03-19 Thread
Hi Pulsar community,

This thread is to start a vote for PIP-345: Optimize finding message by
timestamp

PIP: https://github.com/apache/pulsar/pull/22234
Discuss thread:
https://lists.apache.org/thread/5owc9os6wmy52zxbv07qo2jrfjm17hd2

Thanks,
Tao Jiuming


Re: [DISCUSS] PIP-345: Optimize finding message by timestamp

2024-03-17 Thread
Hi Yubiao,

Thanks for your review!

1. As I mentioned in the *Alternatives*  part of the PIP

`LedgerInfo#timestamp` is broker's timestamp, finding message by timestamp
is client's timestamp(publishTimestamp).
We have to consider about client's clock doesn't sync to broker's clock,
and we are finding a message by publishTimestamp, but pre-filter by
broker's timestamp, it doesn't make sense.

2. Why do we need both `minPublishTimestamp` and `maxPublishTimestamp`

As the email I replied to Girish, the next ledger's `minPublishTimestamp`
might be less than it's previous ledger's `maxPublishTimestamp`,
so we can't use next ledger's `minPublishTimestamp` as it's previous
ledger's `maxPublishTimestamp`, I want to make it clear.

Thanks,
Tao Jiuming

Yubiao Feng  于2024年3月16日周六 20:29写道:

> Hi Jiuming
>
> Firstly, I think the idea you provided is great.
>
> It seems the field `beginPublishTimestamp` is also not needed, and there is
> an existing field `ledgerInfo.timestamp` to use,
> - The current ledger's timestamp can be used as `beginPublishTimestamp`
> - The next ledger's timestamp can be used as `endPublishTimestamp`
>
> Thanks
> Yubiao Feng
>
>
> On Fri, Mar 15, 2024 at 1:57 PM 太上玄元道君  wrote:
>
> > Hi, Girish,
> >
> > Thanks for your feedback!
> >
> > In general, it's a very good suggestion, we can just use one single
> > `beginPublishTimestamp` to achieve our goal,
> > but the actual problem will be a bit more complex.
> >
> > Actually, the naming of `beginPublishTimestamp` and `endPublishTimestamp`
> > has a little problem,
> > it should be `minPublishTimestamp` and `maxPublishTimestamp`.
> >
> > In some cases, next ledger's `minPublishTimestamp` may less than it's
> > previous ledger's `maxPublishTimestamp`,
> > so we have to maintain both `minPublishTimestamp` and
> > `maxPublishTimestamp`.
> >
> > Say, there are 2 producers publishing to the topic, Producer1 send
> > *message1* to the topic, broker received
> > *message1* immediately and persist it to the ledger. Producer2 send
> > *message2* to the broker *before* *message1*,
> > but for some reason, broker received *message2* after a while.
> > At the same time, Ledger switching happens, the previous ledger's
> > `maxPublishTimestamp` is *message1*'s publishTimestamp,
> > and the current ledger's `minPublishTimestamp` is *message2*'s
> > publishTimestamp,
> > so the current ledger's `minPublishTimestamp` is less than the previous
> > ledger's `maxPublishTimestamp`, right?
> >
> > If we just have a single field  `minPublishTimestamp`, it will have a
> > hidden meaning: the next ledger's `minPublishTimestamp`
> > is it's previous ledger's `maxPublishTimestamp`, it's incorrect.
> > So we want to introduce `minPublishTimestamp` and `maxPublishTimestamp`
> to
> > make it clear.
> >
> > Thanks,
> > Tao Jiuming
> >
> > Girish Sharma  于2024年3月15日周五 12:14写道:
> >
> > > One suggestion, I think you can make do with storing just begin
> > timestamp.
> > > Any search utilising these values will work the same way with just one
> of
> > > those timestamps compared to both begin and end.
> > >
> > > Any particular reason you need both the timestamps?
> > >
> > > Regards
> > >
> > > On Fri, Mar 15, 2024, 9:39 AM 太上玄元道君  wrote:
> > >
> > > > bump
> > > >
> > > > 太上玄元道君 于2024年3月10日 周日06:41写道:
> > > >
> > > > > Hi Pulsar community,
> > > > >
> > > > > A new PIP is opened, this thread is to discuss PIP-345: Optimize
> > > finding
> > > > > message by timestamp.
> > > > >
> > > > > Motivation:
> > > > > Finding message by timestamp is widely used in Pulsar:
> > > > > * It is used by the `pulsar-admin` tool to get the message id by
> > > > > timestamp, expire messages by timestamp, and reset cursor.
> > > > > * It is used by the `pulsar-client` to reset the subscription to a
> > > > > specific timestamp.
> > > > > * And also used by the `expiry-monitor` to find the messages that
> are
> > > > > expired.
> > > > > Even though the current implementation is correct, and using binary
> > > > search
> > > > > to speed-up, but it's still not efficient *enough*.
> > > > > The current implementation is to scan all the ledgers to find the
> > > message
> > > > > by timestamp.
> > > > > This is a performance bottleneck, especially for large topics with
> > many
> > > > > messages.
> > > > > Say, if there is a topic which has 1m entries, through the binary
> > > search,
> > > > > it will take 20 iterations to find the message.
> > > > > In some extreme cases, it may lead to a timeout, and the client
> will
> > > not
> > > > > be able to seeking by timestamp.
> > > > >
> > > > > PIP: https://github.com/apache/pulsar/pull/22234
> > > > >
> > > > > Your feedback is very important to us, please take a moment to
> review
> > > the
> > > > > proposal and provide your thoughts.
> > > > >
> > > > > Thanks,
> > > > > Tao Jiuming
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] PIP-345: Optimize finding message by timestamp

2024-03-14 Thread
Hi, Girish,

Thanks for your feedback!

In general, it's a very good suggestion, we can just use one single
`beginPublishTimestamp` to achieve our goal,
but the actual problem will be a bit more complex.

Actually, the naming of `beginPublishTimestamp` and `endPublishTimestamp`
has a little problem,
it should be `minPublishTimestamp` and `maxPublishTimestamp`.

In some cases, next ledger's `minPublishTimestamp` may less than it's
previous ledger's `maxPublishTimestamp`,
so we have to maintain both `minPublishTimestamp` and `maxPublishTimestamp`.

Say, there are 2 producers publishing to the topic, Producer1 send
*message1* to the topic, broker received
*message1* immediately and persist it to the ledger. Producer2 send
*message2* to the broker *before* *message1*,
but for some reason, broker received *message2* after a while.
At the same time, Ledger switching happens, the previous ledger's
`maxPublishTimestamp` is *message1*'s publishTimestamp,
and the current ledger's `minPublishTimestamp` is *message2*'s
publishTimestamp,
so the current ledger's `minPublishTimestamp` is less than the previous
ledger's `maxPublishTimestamp`, right?

If we just have a single field  `minPublishTimestamp`, it will have a
hidden meaning: the next ledger's `minPublishTimestamp`
is it's previous ledger's `maxPublishTimestamp`, it's incorrect.
So we want to introduce `minPublishTimestamp` and `maxPublishTimestamp` to
make it clear.

Thanks,
Tao Jiuming

Girish Sharma  于2024年3月15日周五 12:14写道:

> One suggestion, I think you can make do with storing just begin timestamp.
> Any search utilising these values will work the same way with just one of
> those timestamps compared to both begin and end.
>
> Any particular reason you need both the timestamps?
>
> Regards
>
> On Fri, Mar 15, 2024, 9:39 AM 太上玄元道君  wrote:
>
> > bump
> >
> > 太上玄元道君 于2024年3月10日 周日06:41写道:
> >
> > > Hi Pulsar community,
> > >
> > > A new PIP is opened, this thread is to discuss PIP-345: Optimize
> finding
> > > message by timestamp.
> > >
> > > Motivation:
> > > Finding message by timestamp is widely used in Pulsar:
> > > * It is used by the `pulsar-admin` tool to get the message id by
> > > timestamp, expire messages by timestamp, and reset cursor.
> > > * It is used by the `pulsar-client` to reset the subscription to a
> > > specific timestamp.
> > > * And also used by the `expiry-monitor` to find the messages that are
> > > expired.
> > > Even though the current implementation is correct, and using binary
> > search
> > > to speed-up, but it's still not efficient *enough*.
> > > The current implementation is to scan all the ledgers to find the
> message
> > > by timestamp.
> > > This is a performance bottleneck, especially for large topics with many
> > > messages.
> > > Say, if there is a topic which has 1m entries, through the binary
> search,
> > > it will take 20 iterations to find the message.
> > > In some extreme cases, it may lead to a timeout, and the client will
> not
> > > be able to seeking by timestamp.
> > >
> > > PIP: https://github.com/apache/pulsar/pull/22234
> > >
> > > Your feedback is very important to us, please take a moment to review
> the
> > > proposal and provide your thoughts.
> > >
> > > Thanks,
> > > Tao Jiuming
> > >
> >
>


Re: [DISCUSS] PIP-345: Optimize finding message by timestamp

2024-03-14 Thread
bump

太上玄元道君 于2024年3月10日 周日06:41写道:

> Hi Pulsar community,
>
> A new PIP is opened, this thread is to discuss PIP-345: Optimize finding
> message by timestamp.
>
> Motivation:
> Finding message by timestamp is widely used in Pulsar:
> * It is used by the `pulsar-admin` tool to get the message id by
> timestamp, expire messages by timestamp, and reset cursor.
> * It is used by the `pulsar-client` to reset the subscription to a
> specific timestamp.
> * And also used by the `expiry-monitor` to find the messages that are
> expired.
> Even though the current implementation is correct, and using binary search
> to speed-up, but it's still not efficient *enough*.
> The current implementation is to scan all the ledgers to find the message
> by timestamp.
> This is a performance bottleneck, especially for large topics with many
> messages.
> Say, if there is a topic which has 1m entries, through the binary search,
> it will take 20 iterations to find the message.
> In some extreme cases, it may lead to a timeout, and the client will not
> be able to seeking by timestamp.
>
> PIP: https://github.com/apache/pulsar/pull/22234
>
> Your feedback is very important to us, please take a moment to review the
> proposal and provide your thoughts.
>
> Thanks,
> Tao Jiuming
>


Re: [VOTE] PIP-342: Support OpenTelemetry metrics in Pulsar client

2024-03-13 Thread
+1 nonbinding

Matteo Merli  于2024年3月14日周四 09:45写道:

> PIP: https://github.com/apache/pulsar/pull/22178
>
> WIP PR: https://github.com/apache/pulsar/pull/22179
>
>
>
>
> --
> Matteo Merli
> 
>


Re: [DISCUSS] PIP-345: Optimize finding message by timestamp

2024-03-11 Thread
Hi Dave,
Thanks for your review!

Perhaps it's because I wrote more detailed steps, but the key points is:
1. Deserialize MessageMetadata once broker received message
2. Pass MessageMetadata to `PublishContext`
3. After add entries finished, get `publishTimestamp` from
`PublishContext#messageMetadata`, and
update `beginPublishTimestamp`,`endPublishTimestamp` of the `Ledger`

Since we might deserialize MessageMetadata in message publishing in the
broker side(PersistentTopic#isExceedMaximumDeliveryDelay,
MessageDeduplication), deserialize MessageMetadata once broker received
message will help
to reduce the number of MessageMetadata deserializing in some cases.
About maintain these new ledger fields, it just like
```
public class ManagedLedgerImpl {
// New field
// Add a map to record the begin/end publish timestamp of the ledger
private final NavigableMap> publishTimestamps
= new ConcurrentSkipListMap<>();

// Update the begin/end publish timestamp of the ledger after the entry
is added to the ledger.
// New method
protected void updatePublishTimestamp(long ledgerId, long
publishTimestamp) {
MutablePair pair =
publishTimestamps.computeIfAbsent(ledgerId, k -> new
MutablePair<>(Long.MAX_VALUE, Long.MIN_VALUE));
pair.setLeft(Math.min(pair.getLeft(), publishTimestamp));
pair.setRight(Math.max(pair.getRight(), publishTimestamp));
}
}
```
I just use a Map to maintain it, when closing Ledger, set
`beginPublishTimestamp`,`endPublishTimestamp` to `LedgerInfo`.
Besides, no additional expenses were introduced.

So, if you are asking about `the time spent`, I would say, *nearly* zero.

Thanks,
Tao Jiuming

Dave Fisher  于2024年3月12日周二 10:50写道:

> What can you say about the time spent to maintain these new ledger fields?
> I think you are asking to modify the main message logic which is highly
> optimized., but I’m not sure. Have you tried your code on your own
> hardware? Do you have performance comparisons of the normal flow?
>
> > On Mar 11, 2024, at 7:41 PM, 太上玄元道君  wrote:
> >
> > bump
> >
> > 太上玄元道君 于2024年3月11日 周一17:55写道:
> >
> >> bump
> >>
> >> 太上玄元道君  于2024年3月10日周日 06:41写道:
> >>
> >>> Hi Pulsar community,
> >>>
> >>> A new PIP is opened, this thread is to discuss PIP-345: Optimize
> finding
> >>> message by timestamp.
> >>>
> >>> Motivation:
> >>> Finding message by timestamp is widely used in Pulsar:
> >>> * It is used by the `pulsar-admin` tool to get the message id by
> >>> timestamp, expire messages by timestamp, and reset cursor.
> >>> * It is used by the `pulsar-client` to reset the subscription to a
> >>> specific timestamp.
> >>> * And also used by the `expiry-monitor` to find the messages that are
> >>> expired.
> >>> Even though the current implementation is correct, and using binary
> >>> search to speed-up, but it's still not efficient *enough*.
> >>> The current implementation is to scan all the ledgers to find the
> message
> >>> by timestamp.
> >>> This is a performance bottleneck, especially for large topics with many
> >>> messages.
> >>> Say, if there is a topic which has 1m entries, through the binary
> search,
> >>> it will take 20 iterations to find the message.
> >>> In some extreme cases, it may lead to a timeout, and the client will
> not
> >>> be able to seeking by timestamp.
> >>>
> >>> PIP: https://github.com/apache/pulsar/pull/22234
> >>>
> >>> Your feedback is very important to us, please take a moment to review
> the
> >>> proposal and provide your thoughts.
> >>>
> >>> Thanks,
> >>> Tao Jiuming
> >>>
> >>
>
>


Re: [DISCUSS] PIP-345: Optimize finding message by timestamp

2024-03-11 Thread
bump

太上玄元道君 于2024年3月11日 周一17:55写道:

> bump
>
> 太上玄元道君  于2024年3月10日周日 06:41写道:
>
>> Hi Pulsar community,
>>
>> A new PIP is opened, this thread is to discuss PIP-345: Optimize finding
>> message by timestamp.
>>
>> Motivation:
>> Finding message by timestamp is widely used in Pulsar:
>> * It is used by the `pulsar-admin` tool to get the message id by
>> timestamp, expire messages by timestamp, and reset cursor.
>> * It is used by the `pulsar-client` to reset the subscription to a
>> specific timestamp.
>> * And also used by the `expiry-monitor` to find the messages that are
>> expired.
>> Even though the current implementation is correct, and using binary
>> search to speed-up, but it's still not efficient *enough*.
>> The current implementation is to scan all the ledgers to find the message
>> by timestamp.
>> This is a performance bottleneck, especially for large topics with many
>> messages.
>> Say, if there is a topic which has 1m entries, through the binary search,
>> it will take 20 iterations to find the message.
>> In some extreme cases, it may lead to a timeout, and the client will not
>> be able to seeking by timestamp.
>>
>> PIP: https://github.com/apache/pulsar/pull/22234
>>
>> Your feedback is very important to us, please take a moment to review the
>> proposal and provide your thoughts.
>>
>> Thanks,
>> Tao Jiuming
>>
>


Re: [DISCUSS] PIP-345: Optimize finding message by timestamp

2024-03-11 Thread
bump

太上玄元道君  于2024年3月10日周日 06:41写道:

> Hi Pulsar community,
>
> A new PIP is opened, this thread is to discuss PIP-345: Optimize finding
> message by timestamp.
>
> Motivation:
> Finding message by timestamp is widely used in Pulsar:
> * It is used by the `pulsar-admin` tool to get the message id by
> timestamp, expire messages by timestamp, and reset cursor.
> * It is used by the `pulsar-client` to reset the subscription to a
> specific timestamp.
> * And also used by the `expiry-monitor` to find the messages that are
> expired.
> Even though the current implementation is correct, and using binary search
> to speed-up, but it's still not efficient *enough*.
> The current implementation is to scan all the ledgers to find the message
> by timestamp.
> This is a performance bottleneck, especially for large topics with many
> messages.
> Say, if there is a topic which has 1m entries, through the binary search,
> it will take 20 iterations to find the message.
> In some extreme cases, it may lead to a timeout, and the client will not
> be able to seeking by timestamp.
>
> PIP: https://github.com/apache/pulsar/pull/22234
>
> Your feedback is very important to us, please take a moment to review the
> proposal and provide your thoughts.
>
> Thanks,
> Tao Jiuming
>


[DISCUSS] PIP-345: Optimize finding message by timestamp

2024-03-09 Thread
Hi Pulsar community,

A new PIP is opened, this thread is to discuss PIP-345: Optimize finding
message by timestamp.

Motivation:
Finding message by timestamp is widely used in Pulsar:
* It is used by the `pulsar-admin` tool to get the message id by
timestamp, expire messages by timestamp, and reset cursor.
* It is used by the `pulsar-client` to reset the subscription to a specific
timestamp.
* And also used by the `expiry-monitor` to find the messages that are
expired.
Even though the current implementation is correct, and using binary search
to speed-up, but it's still not efficient *enough*.
The current implementation is to scan all the ledgers to find the message
by timestamp.
This is a performance bottleneck, especially for large topics with many
messages.
Say, if there is a topic which has 1m entries, through the binary search,
it will take 20 iterations to find the message.
In some extreme cases, it may lead to a timeout, and the client will not be
able to seeking by timestamp.

PIP: https://github.com/apache/pulsar/pull/22234

Your feedback is very important to us, please take a moment to review the
proposal and provide your thoughts.

Thanks,
Tao Jiuming


Re: [VOTE] PIP-343: Use picocli instead of jcommander

2024-03-06 Thread
+1 nonbinding

Zixuan Liu 于2024年3月6日 周三23:04写道:

> Hello,
>
> A new proposal to improve the CLI user experience.
>
> PIP: https://github.com/apache/pulsar/pull/22181
> Discussion thread:
> https://lists.apache.org/thread/ydg1q064cd11pxwz693frtk4by74q32f
>
> PR: https://github.com/apache/pulsar/pull/22209
>
> Thanks,
> Zixuan
>


Re: [DISCUSS] PIP-342: Support OpenTelemetry metrics in Pulsar client

2024-03-03 Thread
+1

Enrico Olivelli 于2024年3月3日 周日16:58写道:

> I support the initiative
>
> Lgtm
>
>
> Thanks
> Enrico
>
> Enrico
>
> Il Dom 3 Mar 2024, 04:09 Matteo Merli  ha scritto:
>
> > PIP PR: https://github.com/apache/pulsar/pull/22178
> >
> > WIP of proposed implementation:
> > https://github.com/apache/pulsar/pull/22179
> >
> > 
> >
> > # PIP 342: Support OpenTelemetry metrics in Pulsar client
> >
> > ## Motivation
> >
> > Current support for metric instrumentation in Pulsar client is very
> limited
> > and poses a lot of
> > issues for integrating the metrics into any telemetry system.
> >
> > We have 2 ways that metrics are exposed today:
> >
> > 1. Printing logs every 1 minute: While this is ok as it comes out of the
> > box, it's very hard for
> >any application to get the data or use it in any meaningful way.
> > 2. `producer.getStats()` or `consumer.getStats()`: Calling these methods
> > will get access to
> >the rate of events in the last 1-minute interval. This is problematic
> > because out of the
> >box the metrics are not collected anywhere. One would have to start
> its
> > own thread to
> >periodically check these values and export them to some other system.
> >
> > Neither of these mechanism that we have today are sufficient to enable
> > application to easily
> > export the telemetry data of Pulsar client SDK.
> >
> > ## Goal
> >
> > Provide a good way for applications to retrieve and analyze the usage of
> > Pulsar client operation,
> > in particular with respect to:
> >
> > 1. Maximizing compatibility with existing telemetry systems
> > 2. Minimizing the effort required to export these metrics
> >
> > ## Why OpenTelemetry?
> >
> > [OpenTelemetry](https://opentelemetry.io/) is quickly becoming the
> > de-facto
> > standard API for metric and
> > tracing instrumentation. In fact, as part of [PIP-264](
> > https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
> > we are already migrating the Pulsar server side metrics to use
> > OpenTelemetry.
> >
> > For Pulsar client SDK, we need to provide a similar way for application
> > builder to quickly integrate and
> > export Pulsar metrics.
> >
> > ### Why exposing OpenTelemetry directly in Pulsar API
> >
> > When deciding how to expose the metrics exporter configuration there are
> > multiple options:
> >
> >  1. Accept an `OpenTelemetry` object directly in Pulsar API
> >  2. Build a pluggable interface that describe all the Pulsar client SDK
> > events and allow application to
> > provide an implementation, perhaps providing an OpenTelemetry
> included
> > option.
> >
> > For this proposal, we are following the (1) option. Here are the reasons:
> >
> >  1. In a way, OpenTelemetry can be compared to [SLF4J](
> > https://www.slf4j.org/), in the sense that it provides an API
> > on top of which different vendor can build multiple implementations.
> > Therefore, there is no need to create a new
> > Pulsar-specific interface
> >  2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of
> > Pulsar client, we will only depend on its
> > API. Applications that are going to use OpenTelemetry, will include
> the
> > OTel SDK
> >  3. Providing a custom interface has several drawbacks:
> >  1. Applications need to update their implementations every time a
> new
> > metric is added in Pulsar SDK
> >  2. The surface of this plugin API can become quite big when there
> are
> > several metrics
> >  3. If we imagine an application that uses multiple libraries, like
> > Pulsar SDK, and each of these has its own
> > custom way to expose metrics, we can see the level of integration
> > burden that is pushed to application
> > developers
> >  4. It will always be easy to use OpenTelemetry to collect the metrics
> and
> > export them using a custom metrics API. There
> > are several examples of this in OpenTelemetry documentation.
> >
> > ## Public API changes
> >
> > ### Enabling OpenTelemetry
> >
> > When building a `PulsarClient` instance, it will be possible to pass an
> > `OpenTelemetry` object:
> >
> > ```java
> > interface ClientBuilder {
> > // ...
> > ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry
> > openTelemetry);
> >
> > ClientBuilder openTelemetryMetricsCardinality(MetricsCardinality
> > metricsCardinality);
> > }
> > ```
> >
> > The common usage for an application would be something like:
> >
> > ```java
> > // Creates a OpenTelemetry instance using environment variables to
> > configure it
> > OpenTelemetry otel=AutoConfiguredOpenTelemetrySdk.builder()
> > .build().getOpenTelemetrySdk();
> >
> > PulsarClient client=PulsarClient.builder()
> > .serviceUrl("pulsar://localhost:6650")
> > .build();
> >
> > // 
> > ```
> >
> > Cardinality enum will allow to select a default cardinality label to be
> > attached to the
> > metrics:
> >
> > ```java
> > public enum MetricsCardinality {
> > /**
> >   

Re: Idea: Add a "From zero to hero" contributor roadmap article

2024-03-02 Thread
I support
Pulsar is a very complex software, structuring this knowledge will be very
helpful

Kiryl Valkovich 于2024年3月2日 周六22:01写道:

> Imagine that you are a Pulsar user or interested in data streaming in
> general and would like to contribute to some related open-source project.
> You have some prior programming experience but you've worked on other
> kinds of projects and never developed distributed streaming systems.
> The current documentation doesn't describe the prior knowledge list
> recommended to contribute to Pulsar.
> Having a clear roadmap of what to learn and where to start would bring
> more contributors to the project.
> It may have a similar format:
> •  Java
> Recommended resources:
> ...
> •  Concurrency and Parallelism
> Recommended resources:
> ...
> •  Networking
> Recommended resources:
> ...
> •  Data streaming
> Recommended resources:
> ...
> •  Cyber-security
> Recommended resources:
> ...
> •  Storage and Bookkeeper
> Recommended resources:
> ...
> •  Pulsar internals
> Recommended resources:
> ...
> •  Important libraries to know
> •  Netty
> •  Jetty
> •  Lombok
> ...
> •  Issue list for starting contributing
> ...
> Of course, it takes time to learn these things and can add some additional
> load on the PR reviewers' shoulders at first. But it also could help to
> grow a new wave of contributors who are loyal to the project from the
> beginning, which is important for growing the Pulsar contributors community.
>
> Discussion on GitHub: https://github.com/apache/pulsar/discussions/22176
>
>
> Best,
> Kiryl


Re: [DISCUSS] Migrate CLI parser from jcommander to picocli

2024-02-20 Thread
sounds great!
+1 nonbinding

Zixuan Liu 于2024年2月21日 周三13:08写道:

> Hi Pulsar Community,
>
> The pulsar CLI(pulsar-admin, pulsar-client, pulsar-shell, and so on) uses
> the jcommander [1] as CLI parser, which is an awesome project, but the
> maintainer is not active and cannot keep up with modern CLI
> features(auto-completion, sub-command, native-images, suggest commands, and
> so on). Then I found the picocli [2] project to meet these needs, which is
> active and powerful. For comparison, please see [3].
>
> Error prompt:
> ```
> bin/pulsar-admin clusters update cluster-a -b
>
> # jcommander
> Need to provide just 1 parameter
>
> # picocli
> Unknown option: '-b'
> ```
>
> Suggest commands:
> ```
> bin/pulsar-admin cluste
>
> # jcommander
> Expected a command, got cluste
>
> # picocli
> Unmatched argument at index 0: 'cluste'
> Did you mean: pulsar-admin clusters?
> ```
>
> What do you think about migrating CLI parse from jcommander to picocli?
>
> Thanks,
> Zixuan
>
> [1] - https://github.com/cbeust/jcommander
> [2] - https://picocli.info/
> [3] - https://github.com/remkop/picocli/wiki/picocli-vs-JCommander
>


Re: [ANNOUNCE] New Committer: Asaf Mesika

2024-02-20 Thread
congrats!

Lari Hotari 于2024年2月21日 周三00:50写道:

> The Apache Pulsar Project Management Committee (PMC) has invited
> Asaf Mesika https://github.com/asafm to become a committer and we
> are pleased to announce that he has accepted.
>
> Welcome and Congratulations, Asaf Mesika!
>
> Please join us in congratulating and welcoming Asaf onboard!
>
> Best Regards,
>
> Lari Hotari
> on behalf of the Pulsar PMC
>


Re: [VOTE] PIP-335: Oxia metadata support

2024-02-06 Thread
+1 nonbinding

Matteo Merli 于2024年2月6日 周二04:40写道:

> https://github.com/apache/pulsar/pull/22009
>
> -
>
> # PIP-335: Support Oxia metadata store plugin
>
> # Motivation
>
> Oxia is a scalable metadata store and coordination system that can be used
> as the core infrastructure
> to build large scale distributed systems.
>
> Oxia was created with the primary goal of providing an alternative Pulsar
> to replace ZooKeeper as the
> long term preferred metadata store, overcoming all the current limitations
> in terms of metadata
> access throughput and data set size.
>
> # Goals
>
> Add a Pulsar MetadataStore plugin that uses Oxia client SDK.
>
> Users will be able to start a Pulsar cluster using just Oxia, without any
> ZooKeeper involved.
>
> ## Not in Scope
>
> It's not in the scope of this proposal to change any default behavior or
> configuration of Pulsar.
>
> # Detailed Design
>
> ## Design & Implementation Details
>
> Oxia semantics and client SDK were already designed with Pulsar and
> MetadataStore plugin API in mind, so
> there is not much integration work that needs to be done here.
>
> Just few notes:
>  1. Oxia client already provides support for transparent batching of read
> and write operations,
> so there will be no use of the batching logic in
> `AbstractBatchedMetadataStore`
>  2. Oxia does not treat keys as a walkable file-system like interface, with
> directories and files. Instead
> all the keys are independent. Though Oxia sorting of keys is aware of
> '/' and provides efficient key
> range scanning operations to identify the first level children of a
> given key
>  3. Oxia, unlike ZooKeeper, doesn't require the parent path of a key to
> exist. eg: we can create `/a/b/c` key
> without `/a/b` and `/a` existing.
> In the Pulsar integration for Oxia we're forcing to create all parent
> keys when they are not there. This
> is due to several places in BookKeeper access where it does not create
> the parent keys, though it will
> later make `getChildren()` operations on the parents.
>
> ## Other notes
>
> Unlike in the ZooKeeper implementation, the notification of events is
> guaranteed in Oxia, because the Oxia
> client SDK will use the transaction offset after server reconnections and
> session restarted events. This
> will ensure that brokers cache will always be properly invalidated. We will
> then be able to remove the
> current 5minutes automatic cache refresh which is in place to prevent the
> ZooKeeper missed watch issue.
>
> # Links
>
> Oxia: https://github.com/streamnative/oxia
> Oxia Java Client SDK: https://github.com/streamnative/oxia-java
>
>
> --
> Matteo Merli
> 
>


Re: [DISCUSS] PIP-335: Oxia metadata support

2024-01-31 Thread
+1

Matteo Merli 于2024年2月1日 周四07:58写道:

> https://github.com/apache/pulsar/pull/22009
>
> ===
>
> # PIP-335: Supporty Oxia metadata store plugin
>
> # Motivation
>
> Oxia is a scalable metadata store and coordination system that can be used
> as the core infrastructure
> to build large scale distributed systems.
>
> Oxia was created with the primary goal of providing an alternative Pulsar
> to replace ZooKeeper as the
> long term preferred metadata store, overcoming all the current limitations
> in terms of metadata
> access throughput and data set size.
>
> # Goals
>
> Add a Pulsar MetadataStore plugin that uses Oxia client SDK.
>
> Users will be able to start a Pulsar cluster using just Oxia, without any
> ZooKeeper involved.
>
> ## Not in Scope
>
> It's not in the scope of this proposal to change any default behavior or
> configuration of Pulsar.
>
> # Detailed Design
>
> ## Design & Implementation Details
>
> Oxia semantics and client SDK were already designed with Pulsar and
> MetadataStore plugin API in mind, so
> there is not much integration work that needs to be done here.
>
> Just a few notes:
>  1. Oxia client already provides support for transparent batching of read
> and write operations,
> so there will be no use of the batching logic in
> `AbstractBatchedMetadataStore`
>  2. Oxia does not treat keys as a walkable file-system like interface, with
> directories and files. Instead
> all the keys are independent. Though Oxia sorting of keys is aware of
> '/' and provides efficient key
> range scanning operations to identify the first level children of a
> given key
>  3. Oxia, unlike ZooKeeper, doesn't require the parent path of a key to
> exist. eg: we can create `/a/b/c` key
> without `/a/b` and `/a` existing.
> In the Pulsar integration for Oxia we're forcing to create all parent
> keys when they are not there. This
> is due to several places in BookKeeper access where it does not create
> the parent keys, though it will
> later make `getChildren()` operations on the parents.
>
> ## Other notes
>
> Unlike in the ZooKeeper implementation, the notification of events is
> guaranteed in Oxia, because the Oxia
> client SDK will use the transaction offset after server reconnections and
> session restarted events. This
> will ensure that brokers cache will always be properly invalidated. We will
> then be able to remove the
> current 5minutes automatic cache refresh which is in place to prevent the
> ZooKeeper missed watch issue.
>
> # Links
>
> Oxia: https://github.com/streamnative/oxia
> Oxia Java Client SDK: https://github.com/streamnative/oxia-java
>
>
> --
> Matteo Merli
> 
>


Re: Status of "Oxia" in Apache ecosystem?

2024-01-22 Thread
Oxia is a project of StreamNative inc, it's non public. From my
understanding, it's a zookeeper proxy, like Apache ShardingSphere, for the
purpose of zookeeper's horizontal expansion.

Dave Fisher 于2024年1月23日 周二05:23写道:

> Hi Paul,
>
> There is no such thing as Apache Oxia. It is from a vendor in the Pulsar
> ecosystem - StreamNative. It is meant to plug into Pulsar and I think but
> am not sure that it is a replacement for Apache Zookeeper, but …
>
> Best,
> Dave
>
> > On Jan 22, 2024, at 12:57 AM, Brebner, Paul 
> > 
> wrote:
> >
> > Hi Pulsar people,
> >
> > I’m track chair for a Community over Code EU track 2024, we’ve had a
> submission that mentions “Apache Oxia” and I’d like to check this accuracy
> of this and potential clarifications. I.e. I assume it’s not a top level
> Apache project, but maybe it’s part of Pulsar? In which case the title
> could do with a minor change etc, Regards, Paul Brebner
>
>


Re: [VOTE] PIP-330: getMessagesById gets all messages

2024-01-17 Thread
+1 nonbinding


> 2024年1月17日 17:35,Asaf Mesika  写道:
> 
> +1 (non-binding)
> 
> On Tue, Jan 16, 2024 at 4:43 AM Dezhi Liu  wrote:
> 
>> +1 (non-binding)
>> 
>> Thanks,
>> Dezhi Liu
>> 
>> On 2024/01/15 09:33:48 Zixuan Liu wrote:
>>> Hi Pulsar Community,
>>> 
>>> Voting for PIP-330: getMessagesById gets all messages
>>> 
>>> PIP: https://github.com/apache/pulsar/pull/21873
>>> Discussion thread:
>>> https://lists.apache.org/thread/vqyh3mvtvovd383sd8zxnlzsspdr863z
>>> 
>>> Thanks,
>>> Zixuan
>>> 
>> 



Re: [VOTE] PIP-326: Create a BOM to ease dependency management

2023-12-22 Thread
+1 nonbinding

Umut Bilal Okur 于2023年12月23日 周六00:13写道:

> +1 (non-binding)
>
> Enrico Olivelli , 22 Ara 2023 Cum, 18:26 tarihinde
> şunu yazdı:
>
> > +1 (binding)
> >
> > Enrico
> >
> > Il giorno ven 22 dic 2023 alle ore 02:46 Apurva Telang
> >  ha scritto:
> > >
> > > +1 (non-binding)
> > >
> > > On Thu, Dec 21, 2023 at 4:30 PM Matteo Merli 
> wrote:
> > >
> > > > +1
> > > >
> > > >
> > > > --
> > > > Matteo Merli
> > > > 
> > > >
> > > >
> > > > On Thu, Dec 21, 2023 at 4:24 PM Nicolò Boschi 
> > > > wrote:
> > > >
> > > > > +1 binding
> > > > >
> > > > > Nicolò Boschi
> > > > >
> > > > >
> > > > > Il giorno gio 21 dic 2023 alle 21:21 Lari Hotari <
> lhot...@apache.org>
> > ha
> > > > > scritto:
> > > > >
> > > > > > +1 (binding)
> > > > > >
> > > > > > -Lari
> > > > > >
> > > > > >
> > > > > > On Thu, 21 Dec 2023 at 22:10, Chris Bono 
> wrote:
> > > > > > >
> > > > > > > I'm starting the vote for PIP-326, since it has been reviewed
> by
> > > > > > > several members with no objections.
> > > > > > >
> > > > > > > PIP link: https://github.com/apache/pulsar/pull/21747
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Chris
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > > Apurva Telang.
> >
>


Re: [Vote] PIP-223: Add metrics for all Rest Endpoints

2023-12-20 Thread
Sorry, the PIP had approved, please ignore.

Tao Jiuming  于2023年12月21日周四 04:25写道:

>
> bump
>
> On 2022/11/28 08:47:01 Jiuming Tao wrote:
> > Dear Pulsar Community,
> >
> > Please review and vote on this PIP.
> >
> > PIP link: https://github.com/apache/pulsar/issues/18560
> >
> > Discuss thread:
> > https://lists.apache.org/thread/z74vcn0yolzzrcc4ftonm9j3nbk4pzxm
> >
> > Thanks,
> > Tao Jiuming
> >
>


Re: [DISCUSS] PIP-327 Support force topic loading for unrecoverable errors

2023-12-20 Thread
In my understanding, the PIP is for some certain `extreme` conditions. Some
ledgers failing to recover is an event with a very low probability, and it
should be hard to reproduce(unless we delete some ledgers manually).

If we skip these failed-recover ledgers, message production should be able
to proceed smoothly.

But for message consumption, how can we deal with it?
1. Skip them: it will lead to data loss, even these ledgers just failed to
recover temporarily.
2. Not skip them: Consumers may cann't receive messages from brokers, the
consumption of messages cannot proceed normally, even these ledgers were
deleted and cannot recover.

So we must accurately determine whether these Ledgers are temporarily
unable to recover or will never be able to recover.
Maybe we need to persist the failed-recover number of times of the ledger
into MetadataStore, if the ledger recovers successfully, set it to 0, else,
+1.
And introduce a new configuration such as
`ledgerFailedToRecoverThreashold`,
if the ledger continues to fail-recover, and the number of times is
greater than `ledgerFailedToRecoverThreashold` , delete the ledger from
MetadataStore.

Thanks

PengHui Li  于2023年12月20日周三 16:32写道:

> Hi Rajan,
>
> I tried to test the case that you provided in the proposal.
>
> - Produce messages to a topic
> - Unload the topic 5 times to ensure we have some ledgers in the topic
> - Delete one ledger by using the bookkeeper shell
> - Unload the topic again
> - Start to produce messages again, it works
> - Start a consumer to consume messages from the earliest position, it get
> stuck on the deleted ledger
>
> I don't see the topic load issues. The topic loading works fine, and the
> producer works fine.
> But the proposal said it would resolve the topic load issue, can you
> reproduce the topic load issue?
>
> Regards,
> Penghui
>
>
>
> On Wed, Dec 20, 2023 at 3:28 AM Rajan Dhabalia 
> wrote:
>
> > Hi,
> >
> > We have an issue to fail loading topics in unrecoverable situation and
> > impacting topic availability::
> > https://github.com/apache/pulsar/issues/21751
> > This PIP addresses the issue and allows brokers to handle such situations
> > and maintain the topic availability:
> >
> > PIP: https://github.com/apache/pulsar/pull/21752
> >
> > Thanks,
> > Rajan
> >
>


Re: [ANNOUNCE] Daniel Blankensteiner as new PMC member in Apache Pulsar

2023-11-27 Thread
congrats!

Lari Hotari 于2023年11月27日 周一19:46写道:

> Congratulations on becoming a PMC member, Daniel! Well deserved!
>
> -Lari
>
> On 2023/11/27 08:12:15 tison wrote:
> > The Apache Pulsar Project Management Committee (PMC) has invited
> > Daniel Blankensteiner https://github.com/blankensteiner to join the PMC,
> > and we
> > are pleased to announce that he has accepted.
> >
> > Blanken is the original author of DotPulsar[1]. He is actively
> maintaining
> > the project for years.
> >
> > DotPulsar is the official .NET client library for Apache Pulsar, written
> > entirely in C# and implements Apache Pulsar's binary protocol.
> >
> > On behalf of the Pulsar PMC, we extend a heartfelt welcome and
> > congratulations to Daniel Blankensteiner.
> >
> > Best,
> > tison.
> >
> > [1] https://github.com/apache/pulsar-dotpulsar
> >
>


Re: [VOTE] PIP-312 Use StateStoreProvider to manage state in Pulsar Functions endpoints

2023-11-17 Thread
+1 non binding

Zili Chen 于2023年11月17日 周五17:56写道:

> +1 binding
>
> Thanks for your proposal.
>
> On 2023/11/15 03:39:42 Pengcheng Jiang wrote:
> > Hi Pulsar Community,
> >
> > This thread is to start a vote for PIP-312: Use StateStoreProvider to
> > manage state in Pulsar Functions endpoints.
> >
> > I start the voting process since there are some approves for the PIP PR.
> >
> > PR: https://github.com/apache/pulsar/pull/21438
> > Discussion thread:
> > https://lists.apache.org/thread/0rz29wotonmdck76pdscwbqo19t3rbds
> >
> > Sincerely,
> > Pengcheng Jiang
> >
>


Re: [OT] Evaluate Virtual thread [WAS][DISCUSS] Moving to Java 21

2023-11-07 Thread
In the past, threads usually blocked on LockUtil.park(…), and AQS is
depends on it to park threads.

Virtual thread only solved thread blocking on JDK layer by rewriting
LockUtik.park(…)

In some conditions, the carrier thread will be blocked (I only remember the
following points)
1. JNI calling
2. synchronized
3. system calling

tison 于2023年11月7日 周二21:19写道:

> Hi,
>
> I check the docs for Virtual Threads[1][2][3]. It comes up to me with two
> major concerns about its real-world improvement for Pulsar's scenario:
>
> 1. All of the virtual threads share the same schedule pool, which means
> that all tasks run on virtual threads competing with each other. It can be
> better to separate different logical concurrent groups into dedicated
> groups, although Goroutines share the same global scheduler also.
>
> 2. The point where the virtual thread "yield" ("unmount" in the documents)
> is not quite clear. It's written to be "usually Blocking IO" but can be
> also Future::get or others. It's not easy to audit the change.
>
> Best,
> tison.
>
> [1] https://openjdk.org/jeps/444
> [2]
>
> https://blogs.oracle.com/javamagazine/post/going-inside-javas-project-loom-and-virtual-threads
> [3] https://docs.oracle.com/en/java/javase/21/core/virtual-threads.html
>
>
> Lari Hotari  于2023年10月21日周六 13:22写道:
>
> > Thanks for suggesting. That's a good way to prevent regressions. I made
> > the changes to schedule a daily build with JDK 21. Please review
> > https://github.com/apache/pulsar/pull/21410
> >
> > -Lari
> >
> > On 2023/10/20 12:22:44 Christophe Bornet wrote:
> > > Nice.
> > > Would it be possible to have a daily build on JDK 21 to ensure it runs
> > > properly ?
> > >
> > > Le ven. 20 oct. 2023 à 00:34, Lari Hotari  a
> écrit :
> > > >
> > > > Hi all,
> > > >
> > > > I can now confirm that apache/pulsar master branch compiles and runs
> > all tests in Pulsar CI successfully with Java 21. Therefore, we have
> > already accomplished the first level of Java 21 support.
> > > >
> > > > Example of Pulsar CI build with Java 21:
> > > > https://github.com/lhotari/pulsar/actions/runs/6577911040
> > > >
> > > > This experiment was run with PR #21400 changes which adds an option
> in
> > manually triggered GitHub Actions based Pulsar CI builds with Java 21
> > selected as the runtime for the build and test runtime and also as the
> Java
> > runtime for docker images/containers used in integration & system tests
> > which are part of the Pulsar CI build.
> > > >
> > > > Please review the PR https://github.com/apache/pulsar/pull/21400,
> > let's get it merged.
> > > > By default, Java 17 will be used, so it should be ok to merge this to
> > master branch without any separate decisions such as PIPs.
> > > >
> > > > -Lari
> > > >
> > > > On 2023/10/19 12:23:03 Lari Hotari wrote:
> > > > > I have created https://github.com/apache/pulsar/pull/21400 which
> > parameterizes the JDK version used for the Pulsar CI GitHub Actions
> > workflow. When triggering the workflow
> > > > > manually, it's possible to choose between JDK 17 and JDK 21 from a
> > dropdown menu.
> > > > > The PR contains more details, please review. Once we have this
> > merged, it will be easy to experiment with Java 21 when needed.
> > > > >
> > > > > -Lari
> > > > >
> > > > > On 2023/10/19 03:06:39 tison wrote:
> > > > > > > I think Java 21 can open the door for Virtual Threads[1].
> > > > > >
> > > > > > Yep. This should be a good motivation for using JDK 21.
> > > > > >
> > > > > > We may start a survey in the community a few months later for JDK
> > 21
> > > > > > feedback (as we /will/ switch the runtime to JDK 21 in Docker)
> and
> > try to
> > > > > > switch the toolkit.
> > > > > >
> > > > > > Best,
> > > > > > tison.
> > > > > >
> > > > > >
> > > > > > Zixuan Liu  于2023年10月19日周四 10:56写道:
> > > > > >
> > > > > > > +1 for compatibility with Java 21.
> > > > > > >
> > > > > > > Next step: Migrating the Pulsar Server runtime to Java 21 from
> > Java 17
> > > > > > > in the Docker image and CI. Pulsar Client/Admin continues to
> use
> > Java
> > > > > > > 8.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Zixuan
> > > > > > >
> > > > > > > Lari Hotari  于2023年10月18日周三 06:02写道:
> > > > > > > >
> > > > > > > > Dear Pulsar community,
> > > > > > > >
> > > > > > > > Java 21 was released on September 19th and has now become the
> > current
> > > > > > > Java LTS release.
> > > > > > > >
> > > > > > > > I've begun preparations in the Pulsar code base to allow for
> > Java 21 to
> > > > > > > be used as the development runtime for compiling the code and
> > running tests
> > > > > > > in the master branch. This is a proactive measure to gear up
> for
> > Java 21
> > > > > > > without committing to the switch just yet. It will help us
> > understand the
> > > > > > > necessary changes when we are able to compile the code and run
> > all tests
> > > > > > > with Java 21.
> > > > > > > >
> > > > > > > > For instance, I initiated the process with the following PRs:
> > > 

Re: [DISCUSS] PIP-310: Support custom publish rate limiters

2023-11-03 Thread
Hi Girish,

There is also a discussion thread[1] about rate-limiting.

I think there is some conflicts between some kind of rate-limiter and
backpressure

Take the fail-fast strategy as an example:
Brokers have to reply to clients after receiving and decode the message,
but the broker also has the back-pressure mechanism. Broker cannot read
messages because the channel is `disableAutoRead`.

So the rate-limiters have to adapt to back-pressure.

Thanks,
Tao Jiuming

2023年10月19日 20:51,Girish Sharma  写道:

Hi,
Currently, there are only 2 kinds of publish rate limiters - polling based
and precise. Users have an option to use either one of them in the topic
publish rate limiter, but the resource group rate limiter only uses polling
one.

There are challenges with both the rate limiters and the fact that we can't
use precise rate limiter in the resource group level.

Thus, in order to support custom rate limiters, I've created the PIP-310

This is the discussion thread. Please go through the PIP and provide your
inputs.

Link - https://github.com/apache/pulsar/pull/21399

Regards
-- 
Girish Sharma


Re: [VOTE] PIP-271: Add broker health check status into prometheus metrics #20147

2023-10-26 Thread
+1 non binding

Thanks,
Tao Jiuming

Rajan Dhabalia 于2023年10月27日 周五07:00写道:

> Thanks for adding it.
> +1
>
> Thanks,
> Rajan
>
> On Thu, Oct 26, 2023 at 3:52 PM vineeth p 
> wrote:
>
> > Broker metrics don't have anything to indicate the health of the broker
> (to
> > indicate if the broker is active). In Prometheus broker metrics which are
> > used for monitoring, it will be useful if metrics also show the broker
> > health. This way, Prometheus can automatically scrape the broker state
> and
> > can be used for monitoring purposes. So we need such a metric to capture
> > broker health.
> >
> > PR: https://github.com/apache/pulsar/issues/20389
> >
> > Link to discussion:
> > https://lists.apache.org/thread/dmmqo3bnhtr42vt25yph30nxf6cz57pf
> >
>


Re: [VOTE] PIP-307: Support subscribing multi-topics for WebSocket

2023-10-24 Thread
+1(no-binding)

Thanks,
Tao Jiuming

Cong Zhao 于2023年10月24日 周二16:03写道:

> +1(no-binding)
>
> Thanks,
> Cong Zhao
>
> On 2023/10/19 12:47:45 guo jiwei wrote:
> > Hi dev,
> >Currently WebSocket only supports the consumption of a single topic,
> > which cannot satisfy users' consumption scenarios of multiple topics.  So
> > in order to support consumption of multiple topics or pattern topics, I
> > would like to start a vote for PIP-307
> > .
> >
> >
> > Ref:
> > • Discuss Mail:
> > https://lists.apache.org/thread/co8396ywny161x91dffzvxlt993mo1ht
> > • PIP-307: https://github.com/apache/pulsar/pull/21390
> >
> >
> > Regards
> > Jiwei Guo (Tboy)
> >
>


Re: [VOTE] PIP-300: Add custom dynamic configuration for plugins

2023-10-08 Thread
+1 (no-binding)


Zixuan Liu 于2023年9月26日 周二10:54写道:

> Hi Pulsar Community,
>
> Voting for PIP-300: https://github.com/apache/pulsar/pull/21127
> Discussion thread:
> https://lists.apache.org/thread/ysnsnollgy1b6w1dsvmx1t1y2rz1tyd6
>
> Thanks,
> Zixuan
>


Re: [VOTE] PIP-302 Introduce refreshAsync API for TableView

2023-10-08 Thread
+1 (no-binding)


Xiangying Meng 于2023年9月27日 周三15:05写道:

> Hi dev,
>This thread is to start a vote for PIP-302 Add new API
> refreshAsync for TableView.
> Discuss thread:
> https://lists.apache.org/thread/o085y2314o0fymvx0x8pojmgjwcwn59q
> PIP: https://github.com/apache/pulsar/pull/21166
>
> BR,
> Xiangying
>


Re: [VOTE] Accept pulsar-admin-go as part of the Apache Pulsar project

2023-08-05 Thread
+1 non-binding

Thanks,
Tao Jiuming

Zixuan Liu 于2023年8月5日 周六22:58写道:

> +1 (non-binding)
>
> Thanks,
> Zixuan
>
>  于2023年8月5日周六 21:11写道:
> >
> > +1 (binding)
> >
> > Best,
> > Mattison
> > On 5 Aug 2023 at 20:05 +0800, Yunze Xu , wrote:
> > > +1 (binding)
> > >
> > > Thanks,
> > > Yunze
> > >
> > > On Sat, Aug 5, 2023 at 12:15 PM tison  wrote:
> > > >
> > > > Hi,
> > > >
> > > > I'd like to start a vote thread for accepting pulsar-admin-go[1] as
> part of
> > > > the Apache Pulsar project. pulsar-admin-go is a Golang client library
> > > > interacting with Pulsar Admin API. You can see the full proposal,
> including
> > > > the motivation and features at [2].
> > > >
> > > > Please vote with your opinions. The VOTE will remain open for at
> least 72
> > > > hours.
> > > >
> > > > [ ] +1 approve
> > > > [ ] +0 no opinion
> > > > [ ] -1 disapprove with the reason
> > > >
> > > > Vote from PMC members is binding.
> > > >
> > > > Best,
> > > > tison.
> > > >
> > > > [1] https://github.com/streamnative/pulsar-admin-go
> > > > [2] https://lists.apache.org/thread/tb6c10qlkg9fj56qc0ldkwc79j7qm0vc
>


Re: [DISCUSS] Proposal for donating pulsar-admin-go to the Apache Pulsar project

2023-07-31 Thread
+1

Thanks,
Tao Jiuming

Yunze Xu 于2023年7月31日 周一16:26写道:

> LGTM.
>
> Thanks,
> Yunze
>
> On Wed, Jul 26, 2023 at 5:11 PM tison  wrote:
> >
> > Hi,
> >
> > ## Motivation
> >
> > Pulsar brokers serve two kinds of client-side requests: data panel
> requests
> > (often known as client-api) and control panel requests (often known as
> > admin-api).
> >
> > Pulsar has multiple client libraries support among Java, C++, Python,
> > Golang, C#, Node.js, etc. While most of them implement the client-api,
> > currently, only the Java library implements the admin-api.
> >
> > Nowadays, many Pulsar deployments are in cloud environments, where Golang
> > dominates the domain. Thus, there is an increasing requirement to make
> > control panel requests with integrations of those Golang libraries.
> >
> > StreamNative invented pulsar-admin-go[1] during its development of Pulsar
> > Cloud services. The library is open-sourced months ago[2].
> >
> > Now, we’d propose donating this library to the Apache Pulsar project
> cause
> > Apache provides a solid way to release software, and we may find some
> > opportunities to deduplicate code between pulsar-admin-go and
> > pulsar-client-go.
> >
> > [1] https://github.com/streamnative/pulsar-admin-go
> > [2] https://github.com/apache/pulsar/discussions/19932
> >
> > ## Features
> >
> > pulsar-admin-go follows all the features that Java’s pulsar-client-admin
> > provides. Currently, it supports management over:
> >
> > - BrokerStats
> > - Brokers
> > - Clusters
> > - Functions
> > - FunctionsWorker
> > - Namespaces
> > - NsIsolationPolicy
> > - Packages
> > - ResourceQuotas
> > - Schema
> > - Sinks
> > - Sources
> > - Subscriptions
> > - Tenants
> > - Topics
> >
> > ## Development
> >
> > The open issue for pulsar-admin-go’s development is how we organize code
> > into repos for pulsar-admin-go and pulsar-client-go.
> >
> > Java’s pulsar-client-admin depends on pulsar-client. Although
> > pulsar-admin-go doesn’t depend on pulsar-client-go, they share the same
> > code implementation of auth logics.
> >
> > Generally, maintaining duplicate code is not a good idea. So we have two
> > ways to avoid this situation.
> >
> > The first one is hosting code in three repos: pulsar-admin-go,
> > pulsar-client-go, pulsar-auth-go. Then factor out the shared auth logics
> > into pulsar-auth-go and let pulsar-admin-go and pulsar-client-go depend
> on
> > it.
> >
> > In this way, although it provides a clear boundary between modules, it
> may
> > increase the burden of managing issues and releasing them.
> >
> > Thus, I’d prefer the second approach: hosting pulsar-client-go and
> > pulsar-admin-go code both under the current apache/pulsar-client-go repo,
> > and deduplicate auth logics by letting pulsar-admin-go use the current
> auth
> > logic provided in pulsar-client-go.
> >
> > In this way, we simplify the process of managing issues and doing
> releases,
> > while we should take more care of how to release these three logically
> > separated modules in one repo.
> >
> > If nowadays Golang can compile code in packages, we should be fine to
> merge
> > pulsar-admin-go code just into apache/pulsar-client-go in packages.
> > Otherwise, we should refactor the code into three go modules.
> >
> > For version strategy, as Java’s pulsar-client-admin and pulsar-client do
> > simultaneous releases, pulsar-client-go and pulsar-admin-go can do
> > simultaneous releases also. I expect that pulsar-admin-go doesn’t evolve
> > quite rapidly to require its own release cycle, but almost bugfixes along
> > with the evolution of pulsar-client-go.
> >
> > ## Schedule
> >
> > [ ] Consensus on the overall direction
> > [ ] Consensus on the repo organizing strategy
> > [ ] IP clearance
> > [ ] Repo transfer and setup
> >
> > Looking forward to your feedback!
> >
> > Best,
> > tison.
>


Re: [ANNOUNCE] Qiang Zhao as new PMC member in Apache Pulsar

2023-03-29 Thread
Congrats!!

Thanks,
Tao Jiuming

> 2023年3月29日 23:51,Devin Bost  写道:
> 
> Congrats!
> 
> Devin G. Bost
> 
> 
> On Wed, Mar 29, 2023 at 6:38 AM ZhangJian He  wrote:
> 
>> Congratulations!
>> 
>> Thanks
>> ZhangJian He
>> 
>> 
>> On Wed, 29 Mar 2023 at 19:33, Haiting Jiang 
>> wrote:
>> 
>>> Congratulations!
>>> 
>>> 
>>> Haiting
>>> 
>>> On Wed, Mar 29, 2023 at 5:29 PM Cong Zhao  wrote:
 
 Congrats! Qiang.
 
 
 Thanks,
 Cong Zhao
 
 On 2023/03/29 03:22:43 guo jiwei wrote:
> Dear Community,
> 
> We are thrilled to announce that Qiang Zhao
> (https://github.com/mattisonchao) has been invited and has accepted
>>> the
> role of member of the Apache Pulsar Project Management Committee
>> (PMC).
> 
> Qiang has been a vital asset to our community, consistently
> demonstrating his dedication and active participation through
> significant contributions. In addition to his technical
>> contributions,
> Qiang also plays an important role in reviewing pull requests and
> ensuring the overall quality of our project. We look forward to his
> continued contributions.
> 
> On behalf of the Pulsar PMC, we extend a warm welcome and
> congratulations to Qiang Zhao.
> 
> Best regards
> Jiwei
> 
>>> 
>> 



Re: [Discuss] PIP-248: Add backlog eviction metric

2023-03-15 Thread
> Perhaps clarify, they can call:
> getStats(subscriptionBacklogSize=true, getPreciseBacklog=false,
> getEarliestTimeInBacklog=false)

> Each subscription will contain backlogSize.
> The subscription with max backlogSize will also be the one with oldest
age.

Yes, "The subscription with max backlogSize will also be the one with
oldest age."
But we didn't expose the backlog message age, say, users just want to clear
backlog by using `backlogQuotaTime`.

However, it doesn't matter. It is just a way for users to troubleshoot the
backlog and does not affect the implementation of this PIP.
PIP just gives a use case that users can adjust to the actual situation.

Thanks,
Tao Jiuming


Re: [Discuss] PIP-248: Add backlog eviction metric

2023-03-14 Thread
> Need to replace (place link) with link.

I replaced the `Motivation` with your advice.

> We discussed adding the subscription name which triggered the time limit
to
> Topics.getStats().
> Why?

Since we have `pulsar_storage_backlog_eviction_count`,
I think we don't need to expose the subscription name which triggered the
backlog eviction.

> I have to run getStats(getEarliestTimeInBacklog=true) and it's way more
> expensive than the proposal above, since it needs to reach the earliest
> message for *each* subscription.

I don't think we need to save these expenses, it is only triggered when the
user requests.
 If the user does not set `getEarliestTimeInBacklog` to true, there will be
no such overhead.
We don't need to add complexity for very few calls

> Also a bit less accurate - you want to get the subscription cached that
> triggered it, using the same number to find it. Earliest backlog is
> accurate but if the configuration flag is off, it's not the same number as
> getStats.

Such problems do exist. Maybe there are many backlogs when the user
receives the alert,
but the backlogs have been reduced when the endpoint(Topics#getStats) is
requested.
There is a time difference between them. However, when the user receives an
alarm, it is only a notification.
 When the user requests the endpoint, they may take action.
I think it is reasonable to provide users with a more accurate backlog
before they act.

Thanks,
Tao Jiuming

Asaf Mesika  于2023年3月14日周二 16:51写道:

> >
> > Pulsar has a feature called backlog quota (place link)
>
> Need to replace (place link) with link.
>
>
>
> >1. Find the backlog subscriptions
> >After received the alarm, users could request
> Topics#getStats(topicName,
> >true/false, true, true)
> ><
> https://github.com/apache/pulsar/blob/master/pulsar-client-admin-api/src/main/java/org/apache/pulsar/client/admin/Topics.java#L1139>
> to
> >get the topic stats, and find which subscriptions are in backlog.
> >Pulsar exposed backlogSize and earliestMsgPublishTimeInBacklog in the
> >subscription level, and we will expose backlogQuotaSizeBytes and
> >backlogQuotaTimeSeconds in the topic level, so users could find which
> >subscriptions in backlog easily.
> >
> > We have forgotten the other comment.
> We discussed adding the subscription name which triggered the time limit to
> Topics.getStats().
> Why?
>
> I have to run getStats(getEarliestTimeInBacklog=true) and it's way more
> expensive than the proposal above, since it needs to reach the earliest
> message for *each* subscription.
> Also a bit less accurate - you want to get the subscription cached that
> triggered it, using the same number to find it. Earliest backlog is
> accurate but if the configuration flag is off, it's not the same number as
> getStats.
>
>
> Nice to have (not mandatory) additions:
>
> I would add before
>
> >
> >1. After readEntryComplete
> ><
> https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java#L2780
> >,
> >cache its result:
> >
> > When this configuration flag is set to true, the broker does an I/O call
> by reading the oldest entry to get its write timestamp. Once we have that,
> we'll add caching to that value since we're going to use it for returning
> the age.
>
> I would add before:
>
> > slowestReaderTimeBasedBacklogQuotaCheck
> > <
> https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java#L2817>
> is
> > a totally in-memory method, we just need to cache the
> >
>
> When this configuration flag is set to false, the check uses an estimate of
> the oldest entry timestamp, by taking the closing time of the ledger which
> the message is contained at.
>
> On Fri, Mar 10, 2023 at 8:29 AM 太上玄元道君  wrote:
>
> > I think yes, to avoid missing something, you can take a look if you have
> > time.
> >
> > Thanks,
> > Tao Jiuming
> >
> > Asaf Mesika  于2023年3月9日周四 17:40写道:
> >
> > > Is the PIP updated with all comments?
> > >
> > > On Thu, Mar 9, 2023 at 8:59 AM 太上玄元道君  wrote:
> > >
> > > > > backlogQuotaLimitSize
> > > > > should be `backlogQuotaSizeBytes`
> > > >
> > > > > backlogQuotaLimitTime
> > > > > should be `backlogQuotaTimeSeconds`
> > > >
> > > > > So you need to rename the metric.
> > > > > "pulsar_storage_backlog_quota_count" -->
> > > > > `pulsar_storage_backlo

[Vote] PIP-248: Add metrics for backlog eviction events

2023-03-13 Thread
Hi all,

The thread is to start the vote for PIP-248

Discussion thread:
https://lists.apache.org/thread/m7khb49pk7vdx5yxvhzhkbo7t48d0rzs
PIP link: https://github.com/apache/pulsar/issues/19601

Thanks,
Tao Jiuming


Re: [Discuss] PIP-248: Add backlog eviction metric

2023-03-09 Thread
I think yes, to avoid missing something, you can take a look if you have
time.

Thanks,
Tao Jiuming

Asaf Mesika  于2023年3月9日周四 17:40写道:

> Is the PIP updated with all comments?
>
> On Thu, Mar 9, 2023 at 8:59 AM 太上玄元道君  wrote:
>
> > > backlogQuotaLimitSize
> > > should be `backlogQuotaSizeBytes`
> >
> > > backlogQuotaLimitTime
> > > should be `backlogQuotaTimeSeconds`
> >
> > > So you need to rename the metric.
> > > "pulsar_storage_backlog_quota_count" -->
> > > `pulsar_storage_backlog_eviction_count`
> >
> > > the topic's existing subscription.
> > > "subscription" --> "subscription*s*"
> >
> > > Number of backlog quota happends.
> > > Number of times backlog evictions happened due to exceeding backlog
> quota
> > > (either time or size).
> >
> > Accepted, if there is no more need to change, I'll start the vote next
> > week.
> >
> > Thanks,
> > Tao Jiuming
> >
> >
> > Asaf Mesika  于2023年3月7日周二 00:02写道:
> >
> > > >
> > > > Pulsar has a feature called backlog quota (place link).
> > >
> > > You need to place a link :)
> > >
> > > Expose pulsar_storage_backlog_quota_count in the topic leve
> > >
> > > You already have "pulsar_storage_backlog_size", so why do you need this
> > > metric for?
> > >
> > > backlogQuotaLimitSize
> > >
> > > should be `backlogQuotaSizeBytes`
> > >
> > > backlogQuotaLimitTime
> > >
> > > should be `backlogQuotaTimeSeconds`
> > >
> > > What about goal no.4? Expose oldest unacknowledged message subscription
> > > name?
> > >
> > > IMO, metrics are like API - perhaps indicate the change there as well
> > >
> > > Record the event when dropBacklogForSizeLimit
> > > > <
> > >
> >
> https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BacklogQuotaManager.java#L121
> > > >
> > > >  or dropBacklogForTimeLimit
> > > > <
> > >
> >
> https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BacklogQuotaManager.java#L194
> > >
> > > is
> > > > going to invoked.
> > >
> > >
> > > Oh, now I get it.
> > > So you need to rename the metric.
> > > "pulsar_storage_backlog_quota_count" -->
> > > `pulsar_storage_backlog_eviction_count`
> > >
> > >
> > > > the topic's existing subscription.
> > >
> > > "subscription" --> "subscription*s*"
> > >
> > > Number of backlog quota happends.
> > >
> > > Number of times backlog evictions happened due to exceeding backlog
> quota
> > > (either time or size).
> > >
> > >
> > > >1. Find the backlog subscriptions
> > > >After received the alarm, users could request
> > > Topics#getStats(topicName,
> > > >true/false, true, true)
> > > ><
> > >
> >
> https://github.com/apache/pulsar/blob/master/pulsar-client-admin-api/src/main/java/org/apache/pulsar/client/admin/Topics.java#L1139
> > >
> > > to
> > > >get the topic stats, and find which subscriptions are in backlog.
> > > >Pulsar exposed backlogSize and earliestMsgPublishTimeInBacklog in
> > the
> > > >subscription level, and we will expose backlogQuotaLimitSize and
> > > >backlogQuotaLimitTime in the topic level, so users could find
> which
> > > >subscriptions in backlog easily.
> > > >
> > > > I wrote how it should be done IMO in a previous email.
> > >
> > >
> > > On Mon, Mar 6, 2023 at 1:20 PM 太上玄元道君  wrote:
> > >
> > > > Hi Aasf,
> > > > I've updated the PIP, PTAL
> > > >
> > > > Thanks,
> > > > Tao Jiuming
> > > >
> > > > Asaf Mesika  于2023年3月5日周日 21:00写道:
> > > >
> > > > > On Thu, Mar 2, 2023 at 12:57 PM 太上玄元道君  wrote:
> > > > >
> > > > > > > I  think you should fix this explanation:
> > > > > >
> > > > > > Thanks! I would like to copy the context you provide to the PIP
> > > > > motivation,
> > > > > > your description is more detailed, so dev

Re: [Discuss] PIP-248: Add backlog eviction metric

2023-03-08 Thread
> backlogQuotaLimitSize
> should be `backlogQuotaSizeBytes`

> backlogQuotaLimitTime
> should be `backlogQuotaTimeSeconds`

> So you need to rename the metric.
> "pulsar_storage_backlog_quota_count" -->
> `pulsar_storage_backlog_eviction_count`

> the topic's existing subscription.
> "subscription" --> "subscription*s*"

> Number of backlog quota happends.
> Number of times backlog evictions happened due to exceeding backlog quota
> (either time or size).

Accepted, if there is no more need to change, I'll start the vote next week.

Thanks,
Tao Jiuming


Asaf Mesika  于2023年3月7日周二 00:02写道:

> >
> > Pulsar has a feature called backlog quota (place link).
>
> You need to place a link :)
>
> Expose pulsar_storage_backlog_quota_count in the topic leve
>
> You already have "pulsar_storage_backlog_size", so why do you need this
> metric for?
>
> backlogQuotaLimitSize
>
> should be `backlogQuotaSizeBytes`
>
> backlogQuotaLimitTime
>
> should be `backlogQuotaTimeSeconds`
>
> What about goal no.4? Expose oldest unacknowledged message subscription
> name?
>
> IMO, metrics are like API - perhaps indicate the change there as well
>
> Record the event when dropBacklogForSizeLimit
> > <
> https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BacklogQuotaManager.java#L121
> >
> >  or dropBacklogForTimeLimit
> > <
> https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BacklogQuotaManager.java#L194>
> is
> > going to invoked.
>
>
> Oh, now I get it.
> So you need to rename the metric.
> "pulsar_storage_backlog_quota_count" -->
> `pulsar_storage_backlog_eviction_count`
>
>
> > the topic's existing subscription.
>
> "subscription" --> "subscription*s*"
>
> Number of backlog quota happends.
>
> Number of times backlog evictions happened due to exceeding backlog quota
> (either time or size).
>
>
> >1. Find the backlog subscriptions
> >After received the alarm, users could request
> Topics#getStats(topicName,
> >true/false, true, true)
> ><
> https://github.com/apache/pulsar/blob/master/pulsar-client-admin-api/src/main/java/org/apache/pulsar/client/admin/Topics.java#L1139>
> to
> >get the topic stats, and find which subscriptions are in backlog.
> >Pulsar exposed backlogSize and earliestMsgPublishTimeInBacklog in the
> >subscription level, and we will expose backlogQuotaLimitSize and
> >backlogQuotaLimitTime in the topic level, so users could find which
> >subscriptions in backlog easily.
> >
> > I wrote how it should be done IMO in a previous email.
>
>
> On Mon, Mar 6, 2023 at 1:20 PM 太上玄元道君  wrote:
>
> > Hi Aasf,
> > I've updated the PIP, PTAL
> >
> > Thanks,
> > Tao Jiuming
> >
> > Asaf Mesika  于2023年3月5日周日 21:00写道:
> >
> > > On Thu, Mar 2, 2023 at 12:57 PM 太上玄元道君  wrote:
> > >
> > > > > I  think you should fix this explanation:
> > > >
> > > > Thanks! I would like to copy the context you provide to the PIP
> > > motivation,
> > > > your description is more detailed, so developers don't have to go
> > through
> > > > the code.
> > > >
> > >
> > > Sure
> > >
> > >
> > > >
> > > > > Today the quota is checked periodically, right? So that's how the
> > > > operator
> > > > > knows the cost in terms of I/O is limited.
> > > > > Now you are adding one additional I/O per collection, every 1 min
> by
> > > > > default. That's a lot perhaps. How long is the check interval
> today?
> > > >
> > > > Actually, I don't want to introduce additional costs, I thought we
> > > > could cache its result, so that it won't introduce additional costs.
> > > > It may be that I did not make it clear in the PIP and caused this
> > > > misunderstanding, sorry.
> > > >
> > >
> > > Ok, just to verify: You plan to modify the code that runs periodically
> > the
> > > backlog quota check, so the result will be cached there? This way when
> > you
> > > pull that information from that code every 1min to expose it as a
> metric
> > it
> > > will have 0 I/O cost?
> > >
> > >
> > >
> > > >
> > > > > The user today can calculate q

Re: [Discuss] PIP-248: Add backlog eviction metric

2023-03-06 Thread
Hi Aasf,
I've updated the PIP, PTAL

Thanks,
Tao Jiuming

Asaf Mesika  于2023年3月5日周日 21:00写道:

> On Thu, Mar 2, 2023 at 12:57 PM 太上玄元道君  wrote:
>
> > > I  think you should fix this explanation:
> >
> > Thanks! I would like to copy the context you provide to the PIP
> motivation,
> > your description is more detailed, so developers don't have to go through
> > the code.
> >
>
> Sure
>
>
> >
> > > Today the quota is checked periodically, right? So that's how the
> > operator
> > > knows the cost in terms of I/O is limited.
> > > Now you are adding one additional I/O per collection, every 1 min by
> > > default. That's a lot perhaps. How long is the check interval today?
> >
> > Actually, I don't want to introduce additional costs, I thought we
> > could cache its result, so that it won't introduce additional costs.
> > It may be that I did not make it clear in the PIP and caused this
> > misunderstanding, sorry.
> >
>
> Ok, just to verify: You plan to modify the code that runs periodically the
> backlog quota check, so the result will be cached there? This way when you
> pull that information from that code every 1min to expose it as a metric it
> will have 0 I/O cost?
>
>
>
> >
> > > The user today can calculate quota used for size based limit, since
> there
> > > are two metrics that are exposed today on a topic level: "
> > > pulsar_storage_backlog_quota_limit" and "pulsar_storage_backlog_size".
> > You
> > > can just divide the two to get a percentage.
> > > For the time-based limit, the only metric exposed today is quota
> itself ,
> > "
> > > pulsar_storage_backlog_quota_limit_time".
> >
> > I only noticed `pulsar_storage_backlog_size` but missed
> > `pulsar_storage_backlog_quota_limit` and
> > `pulsar_storage_backlog_quota_limit_time`. Many thanks for your reminder.
> >
> >
> > So, in this condition, we already have the following topic-level metrics:
> > `pulsar_storage_backlog_size`: The total backlog size of the topics of
> this
> > topic owned by this broker (in bytes).
> > `pulsar_storage_backlog_quota_limit`: The total amount of the data in
> this
> > topic that limits the backlog quota (bytes).
> > `pulsar_storage_backlog_quota_limit_time`: The backlog quota limit in
> > time(seconds). (This metric does not exists in the doc, need to improve)
> >
> >
> > We just need to add a new metric named
> > `pulsar_storage_earliest_msg_publish_time_in_backlog` in the topic-level
> > that indicates the publish time of the earliest message in the backlog.
> > So users could get `pulsar_backlog_size_quota_used_percentage` by divide
> > `pulsar_storage_backlog_size ` and
> > `pulsar_storage_backlog_quota_limit`(`pulsar_storage_backlog_size` /
> > `pulsar_storage_backlog_quota_limit`),
> > and could get `pulsar_backlog_time_quota_used_percentage` by divide `now
> -
> > pulsar_storage_earliest_msg_publish_time_in_backlog` and
> > `pulsar_storage_backlog_quota_limit_time` (`now -
> > pulsar_storage_earliest_msg_publish_time_in_backlog` /
> > `pulsar_storage_backlog_quota_limit_time`).
> >
>
> I think there is a problem with the name
> `pulsar_storage_earliest_msg_publish_time_in_backlog` in the topic-level:
> * First, I prefer exposing the age rather than the publish time.
> * Second, it's a bit hard to figure out the meaning of the earliest msg in
> the backlog.
>
> Maybe `pulsar_storage_backlog_age_seconds`? In the explanation you can
> write: "The age (time passed since it was published) of the earliest
> unacknowledged message based on the topic's
> existing subscriptions" ?
>
>
>
> >
> > The backlog quota time checker runs periodically, so we can cache its
> > result, so it won't lead to much costs.
> >
> > Pulsar also exposed subscription-level  `backlogSize` and
> > `earliestMsgPublishTimeInBacklog` in Pulsar-Admin
> > <
> >
> https://github.com/apache/pulsar/blob/master/pulsar-client-admin-api/src/main/java/org/apache/pulsar/client/admin/Topics.java#L1139
> > >
> > if
> > `subscriptionBacklogSize` and `getEarliestTimeInBacklog` are true.
> > We can also expose `backlogQuotaLimiteSize` and `backlogQuotaLimitTime`
> of
> > the topic to PulsarAdmin.
> >
>
> What is the relationship you see between Pulsar exposing
> subscriptionBacklogSize and earliestMsgPublishTimeInBacklog in
> subscription, to exposing the backlog quota limits in pulsar admin?
>
> Limits can be exposed to Pulsar Admin, since it

Re: [Discuss] PIP-248: Add backlog eviction metric

2023-03-02 Thread
ch
> has the oldest unacknowledged message. Since acknowledged messages can be
> interleaved with unacknowledged messages, calculating the exact size of
> that subscription can be expensive as it requires I/O operations to read
> from the messages from the ledgers.
> For that reason, the topic backlog is actually defined to be the estimated
> backlog size of that subscription. It does so by summarizing the size of
> all the ledgers, starting from the current active one, up to the ledger
> which contains the oldest unacknowledged message (There is actually a
> faster way to calculate it, but this is the definition of the estimation).
>
> A topic backlog age is the age of the oldest unacknowledged message (in any
> subscription). If that message was written 30 minutes ago, its age is 30
> minutes.
>
> Pulsar has a feature called backlog quota (place link). It allows the user
> to define a quota - in effect, a limit - which limits the topic backlog.
> There are two types of quotas:
> * Size based: The limit is for the topic backlog size (as we defined
> above).
> * Time based: The limit is for the topic's backlog age (as we defined
> above).
>
> Once a topic backlog exceeds either one of those limits, an action is taken
> upon messages written to the topic:
> * The producer write is placed on hold for a certain amount of time before
> failing.
> * The producer write is failed
> * The subscriptions oldest unacknowledged messages will be acknowledged in
> order until both the topic backlog size or age will fall inside the limit
> (quota). The process is called backlog eviction (happens every interval)
>
> The quotas can be defined as a default value for any topic, by using the
> following broker configuration keys: backlogQuotaDefaultLimitBytes ,
> backlogQuotaDefaultLimitSecond. It can also be specified directly for all
> topics in a given namespace using the namespace policy, or a specific topic
> using a topic policy.
>
> The user today can calculate quota used for size based limit, since there
> are two metrics that are exposed today on a topic level: "
> pulsar_storage_backlog_quota_limit" and "pulsar_storage_backlog_size". You
> can just divide the two to get a percentage.
> For the time-based limit, the only metric exposed today is quota itself , "
> pulsar_storage_backlog_quota_limit_time".
>
> 
>
> I would create two metrics:
>
> `pulsar_backlog_size_quota_used_percentage`
> `pulsar_backlog_time_quota_used_percentage`
>
> You would like to know what triggered the alert, hence two.
> It's not the quota percentage, it's the quota used percentage.
>
> --
>
> It checks if the backlog size exceeds the threshold(
> > backlogQuotaDefaultLimitBytes), and it gets the current backlog size by
> > calculating LedgerInfo
> > <
> https://github.com/apache/pulsar/blob/master/managed-ledger/src/main/proto/MLDataFormats.proto#L54
> >,
> > it will not lead to I/O.
>
> This is not correct.
> It checks against the topic / namespace policy, and if it doesn't exist, it
> falls back on the default configuration key mentioned above.
>
> It checks if the backlog time exceeds the threshold(
> > backlogQuotaDefaultLimitSecond). If preciseTimeBasedBacklogQuotaCheck is
> > set to be true, it will read an entry from Bookkeeper, but the default
> > value is false, which means it gets the backlog time by calculating
> > LedgerInfo
> > <
> https://github.com/apache/pulsar/blob/master/managed-ledger/src/main/proto/MLDataFormats.proto#L54
> >.
> > So in general, we don't need to worry about it will lead to I/O.
>
>
> I'm afraid of that.
> Today the quota is checked periodically, right? So that's how the operator
> knows the cost in terms of I/O is limited.
>  Now you are adding one additional I/O per collection, every 1 min by
> default. That's a lot perhaps. How long is the check interval today?
>
> Perhaps in the backlog quota check, you can persist the check result, and
> use it? Persist the age that is.
>
>
> --
>
> Regarding "slowest_subscription"
> I think the cost is too high, because the subscriptions will keep
> alternating, which can generate so many unique time series. Since
> Prometheus flush only every 2 hours, or any there TSDB, it will cost you
> too much.
>
> I suggest exposing the name via the topic stats. This way they can issue a
> REST call to grab that subscription name only when the alert fires.
>
> Thanks,
>
> Asaf
>
>
>
>
>
> On Tue, Feb 28, 2023 at 9:29 AM 太上玄元道君  wrote:
>
> > Hi Asaf,
> > I've updated the PIP, PTAL
> >
> > Thank,
> > Tao Jiuming
> >
&g

Re: [ANNOUNCE] New committer: Yuri Mizushima

2023-02-27 Thread
congrats!

Thanks,
Tao Jiuming

Nozomi Kurihara  于2023年2月27日周一 15:47写道:

> Hi everyone,
>
> The Project Management Committee (PMC) for Apache Pulsar
> has invited Yuri Mizushima (https://github.com/equanz) to become a
> committer and we are pleased
> to announce that he has accepted.
>
> Yuri has been an active contributor for over 3 years. He has constantly
> sent Pull-Requests for Pulsar and its clients. Moreover he proposed PIP-79
> and gave a talk at Pulsar Summit NA 2021.
>
> Please join us in congratulating and welcoming Yuri onboard!
>
> Thanks,
> Nozomi
>


Re: [Vote] PIP-245: Make subscriptions of non-persistent topic non-durable

2023-02-23 Thread
Hi Rajan,

Even we don't persistent NonPersistentSubscription's state, but it has a
`isDurable` field
```
public NonPersistentSubscription(NonPersistentTopic topic, String
subscriptionName, boolean isDurable,
 Map properties) {
this.topic = topic;
this.topicName = topic.getName();
this.subName = subscriptionName;
this.fullName = MoreObjects.toStringHelper(this).add("topic",
topicName).add("name", subName).toString();
IS_FENCED_UPDATER.set(this, FALSE);
this.lastActive = System.currentTimeMillis();
this.isDurable = isDurable;
this.subscriptionProperties = properties != null
? Collections.unmodifiableMap(properties) :
Collections.emptyMap();
}
```
When users create a `Consumer`, the `durable` will set to be `true`.

And in `removeConsumer` method of `NonPersistentSubscription`, after
consumers disconnected, we can remove the `durable` subscription
automatically.
```
public synchronized void removeConsumer(Consumer consumer, boolean
isResetCursor) throws BrokerServiceException {
updateLastActive();
if (dispatcher != null) {
dispatcher.removeConsumer(consumer);
}
// preserve accumulative stats form removed consumer
ConsumerStatsImpl stats = consumer.getStats();
bytesOutFromRemovedConsumers.add(stats.bytesOutCounter);
msgOutFromRemovedConsumer.add(stats.msgOutCounter);
if (!isDurable) {
topic.unsubscribe(subName);
}

// invalid consumer remove will throw an exception
// decrement usage is triggered only for valid consumer close
topic.decrementUsageCount();
if (log.isDebugEnabled()) {
log.debug("[{}] [{}] [{}] Removed consumer -- count: {}",
topic.getName(), subName, consumer.consumerName(),
topic.currentUsageCount());
}
}
```

The PIP is for the purpose of deprecate `durable` subscriptions on
NonPersistentTopic for the following reasons:
1. We don't persistent NonPersistentSubscription's state, it's totally
in-memory state,  keep the `isDurable` field is meaningless and may takes
misunderstanding
2. The expired subscriptions auto clean like wrote in the PIP.

Thanks,
Tao Jiuming

Rajan Dhabalia  于2023年2月23日周四 12:33写道:

> > But for NonPersistentTopic, creating a Durable subscription is
> meaningless, NonPersistentSubscription doesn't have a ManagedCursor to
> persistent its data. After its consumer disconnected, the subscription
> couldn't be removed automatically if we didn't set the value of
> subscriptionExpirationTimeMinutes greater than 0.
> This is not correct. Non-Persistent topics don't create durable
> subscriptions but It creates NonPersistent Subscription without storing its
> state.
>
> > if we set the value of subscriptionExpirationTimeMinutes greater than 0,
> it may lead to data loss(The durable subscriptions of PersistentTopic also
> can be removed).
> Non-Persistent topics don't provide data persistent or dispatch guarantee
> and it's by design so, let's not try to change semantics of non-persistent
> topic and there could be data loss in non-persistent topic.
>
> > And the Non-durable subscriptions will be removed automatically after all
> the consumers disconnected, it's the existing logic.
> Why can't we do this in the existing NonPersistent Subscription? I really
> don't understand the purpose of this PIP?
>
> Thanks,
> Rajan
>
> On Sun, Feb 12, 2023 at 10:56 PM Jiuming Tao  >
> wrote:
>
> > Hi all,
> >
> > I would like to start a VOTE on `PIP-245: Make subscriptions of
> > non-persistent topic non-durable`.
> >
> > Motivation:
> >
> > There are two types of subscriptions for a topic: Durable and
> Non-durable.
> >
> > We create a Consumer with a Durable subscription and a Reader with a
> > Non-durable subscription.
> >
> > But for NonPersistentTopic, creating a Durable subscription is
> > meaningless, NonPersistentSubscription doesn't have a ManagedCursor to
> > persistent its data. After its consumer disconnected, the subscription
> > couldn't be removed automatically if we didn't set the value of
> > subscriptionExpirationTimeMinutes greater than 0.
> >
> > For subscriptionExpirationTimeMinutes, it controls the subscription
> > expiration of NonPersistentTopic and PersistentTopic, if we set the value
> > of subscriptionExpirationTimeMinutes greater than 0, it may lead to data
> > loss(The durable subscriptions of PersistentTopic also can be removed).
> >
> > And the Non-durable subscriptions will be removed automatically after all
> > the consumers disconnected, it's the existing logic.
> >
> > For the purpose of removing the subscriptions which have no active
> > consumers of NonPersistentTopic and the above reasons, we can make all
> the
> > subscriptions of a NonPersistentTopic Non-durable.
> >
> >
> >
> > For more details, you can read:
> > https://github.com/apache/pulsar/issues/19448 <
> 

[Discuss] PIP-248: Add backlog eviction metric

2023-02-22 Thread
Hi all,

I've started a PIP to discuss: PIP-248 Add backlog eviction metric

### Motivation:

Pulsar has 2 configurations for the backlog eviction:
`backlogQuotaDefaultLimitBytes` and `backlogQuotaDefaultLimitSecond`, if
topic backlog reaches the threshold of any item, backlog eviction will be
triggered.

Before backlog eviction happens, we don't have a metric to monitor how long
that it can reaches the threshold.

We can provide a progress bar metric to tell users some topics is about to
trigger backlog eviction. And users can subscribe the alert to schedule
consumers.

For more details, please read the PIP at
https://github.com/apache/pulsar/issues/19601

Thanks,
Tao Jiuming