[VOTE] PIP-318: Don't retain null-key messages during topic compaction

2023-11-09 Thread Cong Zhao
Hi Community,

This thread is to start a vote for PIP-318.

PIP: https://github.com/apache/pulsar/pull/21541
Discussion thread: 
https://lists.apache.org/thread/68k6vrghfp3np601lrfx5mbfmghbbrjh

Thanks,
Cong Zhao


Re: [VOTE] Pulsar Client C++ Release 3.4.0 Candidate 2

2023-11-09 Thread PengHui Li
+1 (binding)

- Checked the signature
- Build from source and run the tests

Regards,
Penghui

On Thu, Nov 9, 2023 at 5:15 PM Yunze Xu  wrote:

> +1 (binding)
>
> - Verified checksum and signatures
> - Built from source and run unit tests against the test standalone on
> macOS Ventura 13.4 and Apple Clang 14.0.3
>
> A tip for the failed LookupServiceTest.testMultiAddresses I mentioned
> before, it's because this test has a too strict requirement that the
> localhost should be mapped to 127.0.0.1, while on my macOS localhost
> was mapped to IPv6 ::1 first. The test passed after I modified the
> /etc/hosts file.
>
> Thanks,
> Yunze
>
> On Thu, Nov 9, 2023 at 11:42 AM Yunze Xu  wrote:
> >
> > Hi Penghui,
> >
> > It's caused by the relative path and I explained it that issue. Here
> > is an improvement for tests:
> > https://github.com/apache/pulsar-client-cpp/pull/340.
> >
> > Thanks,
> > Yunze
>


Re: [DISCUSS] PIP-310: Support custom publish rate limiters

2023-11-09 Thread Girish Sharma
Hello Lari, replies inline

On Thu, Nov 9, 2023 at 6:50 AM Lari Hotari  wrote:

> Hi Girish,
>
> replies inline.
>
> On Thu, 9 Nov 2023 at 00:29, Girish Sharma 
> wrote:
> > While dual-rate dual token bucket looks promising, there is still some
> > challenge with respect to allowing a certain peak burst for/up to a
> bigger
> > duration. I am explaining it below:
>
> > Assume a 10MBps topic. Bursting support of 1.5x upto 2 minutes, once
> every
> > 10 minute interval.
>
> It's possible to have many ways to model a dual token buckets.
> When there are tokens in the bucket, they are consumed as fast as
> possible. This is why there is a need for the second token bucket
> which is used to rate limit the traffic to the absolute maximum rate.
> Technically the second bucket rate limits the average rate for a short
> time window.
>
> I'd pick the first bucket for handling the 10MB rate.
> The capacity of the first bucket would be 15MB * 120=1800MB. The fill
> would happen in special way. I'm not sure if Bucket4J has this at all.
> So describing the way of adding tokens to the bucket: the tokens in
> the bucket would remain the same when the rate is <10MB. As many
>

How is this special behavior (tokens in bucket remaining the same when rate
is <10MB) achieved? I would assume that to even figure out that the rate is
less than 10MB, there is some counter going around?


> tokens would be added to the bucket as are consumed by the actual
> traffic. The left over tokens 10MB - actual rate would go to a
> separate filling bucket that gets poured into the actual bucket every
> 10 minutes.
> This first bucket with this separate "filling bucket" would handle the
> bursting up to 1800MB.
>

But this isn't the requirement? Let's assume that the actual traffic has
been 5MB for a while and this 1800MB capacity bucket is all filled up now..
What's the real use here for that at all?


> The second bucket would solely enforce the 1.5x limit of 15MB rate
> with a small capacity bucket which enforces the average rate for a
> short time window.
> There's one nuance here. The bursting support will only allow bursting
> if the average rate has been lower than 10MBps for the tokens to use
> for the bursting to be usable.
> It would be possible that for example 50% of the tokens would be
> immediately available and 50% of the tokens are made available in the
> "filling bucket" that gets poured into the actual bucket every 10
> minutes. Without having some way to earn the burst, I don't think that
> there's a reasonable way to make things usable. The 10MB limit
>
wouldn't have an actual meaning unless that is used to "earn" the
> tokens to be used for the burst.
>
>
I think this approach of thinking about rate limiter - "earning the right
to burst by letting tokens remain into the bucket, (by doing lower than
10MB for a while)" doesn't not fit well in a messaging use case in real
world, or theoretic.
For a 10MB topic, if the actual produce has been , say, 5MB for a long
while, this shouldn't give the right to that topic to burst to 15MB for as
much as tokens are present.. This is purely due to the fact that this will
then start stressing the network and bookie disks.
Imagine a 100 of such topics going around with similar configuration of
fixed+burst limits and were doing way lower than the fixed rate for the
past couple of hours. Now that they've earned enough tokens, if they all
start bursting, this will bring down the system, which is probably not
capable of supporting simultaneous peaks of all possible topics at all.

Now of course we can utilize a broker level fixed rate limiter to not allow
the overall throughput of the system to go beyond a number, but at that
point - all the earning semantic goes for a toss anyway since the behavior
would be unknown wrt which topics are now going through with bursting and
which are being blocked due to the broker level fixed rate limiting.

As such, letting topics loose would not sit well with any sort of SLA
guarantees to the end user.

Moreover, contrary to the earning tokens logic, in reality a topic _should_
be allowed to burst upto the SOP/SLA as soon as produce starts in the
topic. It shouldn't _have_ to wait for tokens to fill up as it does
below-fixed-rate for a while before it is allowed to burst. This is because
there is no real benefit or reason to not let the topic do such as the
hardware is already present and the topic is already provisioned
(partitions, broker spread) accordingly, assuming the burst.

In an algorithmic/academic/literature setting, token bucket sounds really
promising.. but a platform with SLA to users would not run like that.



> In the current rate limiters in Pulsar, the implementation is not
> optimized to how Pulsar uses rate limiting. There's no need to use a
> scheduler for adding "permits" as it's called in the current rate
> limiter. The new tokens to add can be calculated based on the elapsed
> time. Resuming from the blocking state (auto read disabled 

Re: [DISCUSS] Replace stale bot with ping-pong workflow

2023-11-09 Thread tison
Thank you!

I saw a few other complains for this "noise" from the issue author recently
also.

Since the real-world solution is handling issues effectively, I support
disable this action. Keep the PR open for a while to accept comments.

I'll merge this patch in the next monday if no more objection.

Best,
tison.


Asaf Mesika  于2023年11月9日周四 21:21写道:

> Submitted a PR to disable it: https://github.com/apache/pulsar/pull/21549
>
> On Tue, Nov 7, 2023 at 3:58 PM Asaf Mesika  wrote:
>
> > Tison let's start as you suggested by disabling it
> >
> >
> > On Tue, May 16, 2023 at 5:13 AM Yunze Xu  wrote:
> >
> >> +1 to me
> >>
> >> Thanks,
> >> Yunze
> >>
> >> On Sun, May 14, 2023 at 9:28 PM Dave Fisher 
> >> wrote:
> >> >
> >> > Hi -
> >> >
> >> > I have not looked at all your links but I think this is a great idea.
> >> This will help everyone pay attention better.
> >> >
> >> > Best,
> >> > Dave
> >> >
> >> > Sent from my iPhone
> >> >
> >> > > On May 14, 2023, at 12:33 AM, tison  wrote:
> >> > >
> >> > > Of course, changing the workflow cannot magically increase the
> >> bandwidth to
> >> > > handle stale issues. That is what the triage guide wants to
> encourage
> >> > > committers to practice. But such a move can reduce the frustrating
> >> > > experience and explicitly express who is responsible for taking the
> >> next
> >> > > action to nudge the conversation.
> >> > >
> >> > > Best,
> >> > > tison.
> >> > >
> >> > >
> >> > > tison  于2023年5月14日周日 15:28写道:
> >> > >
> >> > >> Hi devs,
> >> > >>
> >> > >> Recently, I have handled a large number of stale issues and noticed
> >> that
> >> > >> periodically notifying users that "the issue is stale" without any
> >> human
> >> > >> reaction can be a frustrating experience, e.g., ISSUE-13925[1].
> >> > >>
> >> > >> Learning from the INFRA JIRA project experience, I propose we
> >> replace the
> >> > >> stale bot with a ping-pong workflow. That is -
> >> > >>
> >> > >> ping - Labeling waiting-for-reviewer on issue created and commented
> >> by
> >> > >> non-committers
> >> > >> pong - Labeling waiting-for-user on issue responded by committers
> >> > >>
> >> > >> Here is a demo implementation[2] you can refer to and you can try
> the
> >> > >> workflow in my fork[3].
> >> > >>
> >> > >> Previous references -
> >> > >>
> >> > >> * The triage guide[4]
> >> > >> * [DISCUSS] Does stale bot make value for you?[5]
> >> > >> * [COMMITTER ATTENTION] You can close stale issues as not planned
> [6]
> >> > >>
> >> > >> Looking forward to your feedback :D
> >> > >>
> >> > >> Best,
> >> > >> tison.
> >> > >>
> >> > >> [1] https://github.com/apache/pulsar/issues/13925
> >> > >> [2] https://github.com/apache/pulsar/pull/20319
> >> > >> [3] https://github.com/tisonkun/pulsar
> >> > >> [4] https://pulsar.apache.org/contribute/develop-triage
> >> > >> [5]
> https://lists.apache.org/thread/tv774jqohdpx8x0dymsskrd90xwwfvgp
> >> > >> [6]
> https://lists.apache.org/thread/x2c7xod8y0wvh14nsb6bknf0dq3r9gls
> >> > >>
> >> > >>
> >> >
> >>
> >
>


Re: [DISCUSS] Replace stale bot with ping-pong workflow

2023-11-09 Thread Asaf Mesika
Submitted a PR to disable it: https://github.com/apache/pulsar/pull/21549

On Tue, Nov 7, 2023 at 3:58 PM Asaf Mesika  wrote:

> Tison let's start as you suggested by disabling it
>
>
> On Tue, May 16, 2023 at 5:13 AM Yunze Xu  wrote:
>
>> +1 to me
>>
>> Thanks,
>> Yunze
>>
>> On Sun, May 14, 2023 at 9:28 PM Dave Fisher 
>> wrote:
>> >
>> > Hi -
>> >
>> > I have not looked at all your links but I think this is a great idea.
>> This will help everyone pay attention better.
>> >
>> > Best,
>> > Dave
>> >
>> > Sent from my iPhone
>> >
>> > > On May 14, 2023, at 12:33 AM, tison  wrote:
>> > >
>> > > Of course, changing the workflow cannot magically increase the
>> bandwidth to
>> > > handle stale issues. That is what the triage guide wants to encourage
>> > > committers to practice. But such a move can reduce the frustrating
>> > > experience and explicitly express who is responsible for taking the
>> next
>> > > action to nudge the conversation.
>> > >
>> > > Best,
>> > > tison.
>> > >
>> > >
>> > > tison  于2023年5月14日周日 15:28写道:
>> > >
>> > >> Hi devs,
>> > >>
>> > >> Recently, I have handled a large number of stale issues and noticed
>> that
>> > >> periodically notifying users that "the issue is stale" without any
>> human
>> > >> reaction can be a frustrating experience, e.g., ISSUE-13925[1].
>> > >>
>> > >> Learning from the INFRA JIRA project experience, I propose we
>> replace the
>> > >> stale bot with a ping-pong workflow. That is -
>> > >>
>> > >> ping - Labeling waiting-for-reviewer on issue created and commented
>> by
>> > >> non-committers
>> > >> pong - Labeling waiting-for-user on issue responded by committers
>> > >>
>> > >> Here is a demo implementation[2] you can refer to and you can try the
>> > >> workflow in my fork[3].
>> > >>
>> > >> Previous references -
>> > >>
>> > >> * The triage guide[4]
>> > >> * [DISCUSS] Does stale bot make value for you?[5]
>> > >> * [COMMITTER ATTENTION] You can close stale issues as not planned [6]
>> > >>
>> > >> Looking forward to your feedback :D
>> > >>
>> > >> Best,
>> > >> tison.
>> > >>
>> > >> [1] https://github.com/apache/pulsar/issues/13925
>> > >> [2] https://github.com/apache/pulsar/pull/20319
>> > >> [3] https://github.com/tisonkun/pulsar
>> > >> [4] https://pulsar.apache.org/contribute/develop-triage
>> > >> [5] https://lists.apache.org/thread/tv774jqohdpx8x0dymsskrd90xwwfvgp
>> > >> [6] https://lists.apache.org/thread/x2c7xod8y0wvh14nsb6bknf0dq3r9gls
>> > >>
>> > >>
>> >
>>
>


Re: [VOTE] Pulsar Release 3.0.2 Candidate 2

2023-11-09 Thread Lari Hotari
Any updates on the 3.0.2 release?

-Lari

On 2023/10/30 03:23:06 Yubiao Feng wrote:
> Update:
>  blocked by https://github.com/apache/pulsar/pull/21445
> 
> I will do a new candidate later
> 
> Thanks
> Yubiao Feng
> 
> On Fri, Oct 27, 2023 at 10:50 PM Yubiao Feng 
> wrote:
> 
> > This is the first release candidate for Apache Pulsar version 3.0.2.
> >
> > It fixes the following issues:
> >
> > https://github.com/apache/pulsar/pulls?q=is%3Apr+is%3Amerged+label%3Arelease%2F3.0.2+label%3Acherry-picked%2Fbranch-3.0+
> >
> > *** Please download, test and vote on this release. This vote will
> > stay open for at least 72 hours ***
> >
> > Note that we are voting upon the source (tag), binaries are provided
> > for convenience.
> >
> > Source and binary files:
> > https://dist.apache.org/repos/dist/dev/pulsar/pulsar-3.0.2-candidate-2/
> >
> > SHA-512 checksums:
> >
> >
> > a4eaf3afabfe89f34d4cd29b2bc63ad2219a729319c159bae17940cede3afcf8aebd4d467b9f5226a17ab0b7e878300038364d4d122193aa3494f3b9bad0b3cc
> >
> > apache-pulsar-3.0.2-bin.tar.gz
> >
> >
> > 9e0103f93e00c09c5db8a4cdf1b7d135bed5f0aa5f1c40a52d8caf4f3d269ca4972e25e87d8c0254212e7e089421ede1a92a608824fd1a240c05372b349ed095
> >
> > apache-pulsar-3.0.2-src.tar.gz
> >
> > Maven staging repo:
> > https://repository.apache.org/content/repositories/orgapachepulsar-1245/
> >
> > The tag to verify:
> > v3.0.2-candidate-2 (cd5d2bef8d65c0f6158b8eb4b7ca7fbbde7028c1)
> > https://github.com/apache/pulsar/releases/tag/v3.0.2-candidate-2
> >
> > Pulsar's KEYS file containing PGP keys you use to sign the release:
> > https://dist.apache.org/repos/dist/dev/pulsar/KEYS
> >
> > Docker images:
> >
> > pulsar images:
> > https://hub.docker.com/repository/docker/9947090/pulsar-all
> >
> > pulsar-all images:
> > https://hub.docker.com/repository/docker/9947090/pulsar
> >
> > Please download the source package, and follow the README to build
> > and run the Pulsar standalone service.
> >
> >
> > Regards
> > Yubiao Feng(poorbarcode)
> >
> 


Re: [VOTE] Pulsar Client C++ Release 3.4.0 Candidate 2

2023-11-09 Thread Yunze Xu
+1 (binding)

- Verified checksum and signatures
- Built from source and run unit tests against the test standalone on
macOS Ventura 13.4 and Apple Clang 14.0.3

A tip for the failed LookupServiceTest.testMultiAddresses I mentioned
before, it's because this test has a too strict requirement that the
localhost should be mapped to 127.0.0.1, while on my macOS localhost
was mapped to IPv6 ::1 first. The test passed after I modified the
/etc/hosts file.

Thanks,
Yunze

On Thu, Nov 9, 2023 at 11:42 AM Yunze Xu  wrote:
>
> Hi Penghui,
>
> It's caused by the relative path and I explained it that issue. Here
> is an improvement for tests:
> https://github.com/apache/pulsar-client-cpp/pull/340.
>
> Thanks,
> Yunze