Re: [DISCUSS] PIP-247: Notifications for partitions update

2023-03-06 Thread Michael Marshall
Thanks for the context Xiaoyu Hou and Asaf. I appreciate the
efficiencies that we can gain by creating a specific implementation
for the partitioned topic use case. I agree that this new notification
system makes sense based on Pulsar's current features, and I have some
implementation questions.

>- If the broker sends notification and it's lost due network issues,
> you'll only know about it due to the client doing constant polling, using
> its hash to minimize response.

I see that we implemented an ack mechanism to get around this. I
haven't looked closely, but is there a reason that we couldn't also
use this to improve PIP 145?

Since we know we're using a TCP connection, is it possible to rely on
pulsar's keep alive timeout (the broker and the client each have their
own) to close a connection that isn't responsive? Then, when the
connection is re-established, the client would get the latest topic
partition count.

Regarding the connection, which connection should the client use to
send the watch requests? At the moment, the "parent" partitioned topic
does not have an owner, but perhaps it would help this design to make
a single owner for a given partitioned topic. This could trivially be
done using the existing bundle mapping. Then, all watchers for a given
partitioned topic would be hosted on the same broker, which should be
more efficient. I don't think we currently redirect clients to any
specific bundle when creating the metadata for a partitioned topic,
but if we did, then we might be able to remove some edge cases for
notification delivery because a single broker would update the
metadata store and then trigger the notifications to the clients. If
we don't use this implementation, do we plan on using metadata store
notifications to trigger the callbacks that trigger notifications sent
to the clients?

> - Each time meta-update you'll need to run it through regular
> expression, on all topics hosted on the broker, for any given client.
> That's a lot of CPU.
> - Suggested mechanism mainly cares about the count of partitions, so
> it's a lot more efficient.

I forgot the partition count was its own piece of metadata that the
broker can watch for. That part definitely makes sense to me.

One nit on the protobuf for CommandWatchPartitionUpdateSuccess:

repeated string topics = 3;
repeated uint32 partitions = 4;

What do you think about using a repeated message that represents a
pair of a topic and its partition count instead of using two lists?

How will we handle the case where a watched topic does not exist?

I want to touch on authorization. A role should have "lookup"
permission to watch for updates on each partitioned topic that it
watches. As a result, if we allow for a request to watch multiple
topics, some might succeed while others fail. How do we handle partial
success?

One interesting detail is that this PIP is essentially aligned with
notifying clients when topic metadata changes while PIP 145 was
related to topic creation itself. An analogous proposal could request
a notification for any topic that gets a new metadata label. I do not
think it is worth considering that case in this design.

Thanks,
Michael

[0] https://lists.apache.org/thread/t4cwht08d4mhp3qzoxmqh6tht8l0728r

On Sun, Mar 5, 2023 at 8:01 PM houxiaoyu  wrote:
>
> Bump. Are there other concerns or suggestions about this PIP :)  Ping @
> Michael @Joe @Enrico
>
> Thanks
> Xiaoyu Hou
>
> houxiaoyu  于2023年2月27日周一 14:10写道:
>
> > Hi Joe and Michael,
> >
> > I think I misunderstood what you replied before. Now I understand and
> > explain it again.
> >
> > Besides the reasons what Asaf mentioned above, there are also some limits
> > for using topic list watcher.  For example the `topicsPattern.pattern` must
> > less that `maxSubscriptionPatternLeng` [0]. If the consumer subscribes
> > multi partitioned-topics, the `topicsPattern.pattern` maybe very long.
> >
> > So I think that it's better to have a separate notification implementation
> > for partition update.
> >
> > [0]
> > https://github.com/apache/pulsar/blob/5d6932137d76d544f939bef27df25f61b4a4d00d/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/TopicListService.java#L115-L126
> >
> > Thanks,
> > Xiaoyu Hou
> >
> > houxiaoyu  于2023年2月27日周一 10:56写道:
> >
> >> Hi Michael,
> >>
> >> >  I think we just need the client to "subscribe" to a topic notification
> >> for
> >> >  "-partition-[0-9]+" to eliminate the polling
> >>
> >> If pulsar users want to pub/sub a partitioned-topic, I think most of the
> >> users would like to create a simple producer or consumer like following:
> >> ```
> >> Producer producer = client.newProducer().topic(topic).create();
> >> producer.sendAsync(msg);
> >> ```
> >> ```
> >> client.newConsumer()
> >> .topic(topic)
> >> .subscriptionName(subscription)
> >> .subscribe();
> >> ```
> >> I think there is no reason for users to use `topicsPattern` if a pulsar
> >> just wants to subscribe a 

Re: Please stop cherry-picking (breaking) changes to the released branches

2023-03-06 Thread Michael Marshall
If we are concerned about correctly ensuring only bug fixes go to
release branches, I think we ought to discuss using a merge git
strategy instead of a cherry pick git strategy. This change was
discussed briefly on this mailing list at the end of 2021 [0]. I will
make a brief argument for making the switch, in case there is
interest.

Ultimately, I think this is related to a project's priorities. Do we
prioritize stability on release branches or velocity on the master
branch? Given that we're moving to the LTS plan with 3.0.0, it could
be a good time to make the switch.

Note that one of the primary rebuttals for this feature is that GitHub
defaults PRs to target master. I don't think that is a real problem
for us, though. Clear documentation and a welcoming community can help
new contributors unfamiliar with git understand the workflow.

The primary issue we would face is split brain on our process where
2.x cherry picks bug fixes while 3.0 merges them forward.

For those unfamiliar with the merge strategy: the general flow is that
a bug fix targets the branch where the bug was introduced (or the
oldest active branch) and then we merge each release branch into the
next oldest release branch until we reach master. The benefits include
always running tests before merging/committing changes, more code
reviews for changes being made to release branches, and a clear sense
of where a commit will end up. Also, the git history for a given
commit will show which branches have the commit, which can remove the
need for descriptive (but potentially misleading) labels on PRs.

Finally, the merge strategy should make it easier to quickly cut patch
releases because branches will alway be "ready". That is a huge
benefit for patching security vulnerabilities.

Thanks,
Michael

[0] https://lists.apache.org/thread/zqdqz4jd641vszkj3mzdn6zc3yt56rsk


On Sun, Mar 5, 2023 at 8:52 AM Xiangying Meng  wrote:
>
> >Maybe it is better to start a discussion on the mailing list when you want
> to
> >cherry-pick something and wait for some time.
> >If nobody objects then it is good to go
>
> Because there are a large number of PRs that need to be cherry-picked,
> some PRs may not strictly abide by this agreement when cherry-picking.
> But from the perspective of a release manager,
> I think there are three points that we should abide by.
> 1. As Enrico said in the discussion of starting release 2.10.3,
>  it is better not to include the new last-minute things in the release for
> a stable branch.
> 2. The release manager will send an email to notify everyone after all the
> PRs with the release label are cherry-picked.
>  If there is a new PR that needs to enter this release at this time,
>  it is better to submit a PR to cherry-pick it and leave a message under
> the mail.
> 3. If the newly entered commit is verified, whether it runs a CI in its own
> repository or in pulsar's repository,
>  then we can add the `Verified` label to it.
>  Otherwise, it is best not to add the `Verified` label to this commit.
>
> For example, this commit [1] is a new thing that was just merged before
> starting this release.
> It has not been verified but has a verified label.
> Due to my mistake, after noticing the Verified label, I didn't check
> further to see if the commit was verified.
> This commit introduced some failure in CI, so we now need to re-release.
>
> Of course, this is just a small problem so far.
> Fortunately, there is a new change [2] that needs to be imported to
> branch-2.10 today,
> so we discovered this problem in time.
> After all, the current verification process for the release candidates
> cannot discover these problems.
>
> Normally, such PRs with minor changes and little conflicts to branch-x can
> be directly cherry-picked without running CI,
> but it is best not to do this before releasing immediately.
>
> Let`s do our best to make the Pulsar community more mature and perfect.
>
>
> Sincerely
> Xiangying
>
> [1]
> https://github.com/apache/pulsar/commit/a3a242cc48e8d7454ee0c5c8fd872ae6f92ae4f7
> [2]  https://github.com/apache/pulsar/pull/19711
>
> On Tue, Feb 28, 2023 at 7:17 PM Enrico Olivelli  wrote:
>
> > Il giorno mar 28 feb 2023 alle ore 12:11 PengHui Li
> >  ha scritto:
> > >
> > > > By the way the main point in this email thread is that we should
> > > totally stop to do cherry-picks of stuff that it is
> > > not strictly needed
> > >
> > > Yes, the main issue we need to resolve is how we can define if
> > >  the stuff strictly needed to cherry-pick. Do you think the author
> > > to provide the cherry-pick information or reviewers to add labels
> > > and confirm the label is correct before merging it is a good way?
> > > Who wants to update the release/* label, the context is required.
> > > Do not only change the release label without any information.
> >
> > Maybe it is better to start a discussion on the mailing list when you want
> > to
> > cherry-pick something and wait for some time.
> > If nobody objects 

Re: [VOTE] Pulsar Release 2.10.4 Candidate 1

2023-03-06 Thread Xiangying Meng
Please ignore the previous email. This commit did not break CI.
Instead, a very coincidental thing happened.
1. There may be problems with the maven server at that time. The three PRs
mentioned at that time could not download the correct jar package, and the
retry was invalid.
2. A flaky test `recoverLongTimeAfterMultipleWriteErrors` failed multiple
times in a row.

So I mistakenly thought it was caused by the last unverified commit.
So the RC is correct, please help verify it and vote.

Thanks
Xiangying

On Sun, Mar 5, 2023 at 9:40 PM Xiangying Meng  wrote:

> Hi, community,
>
> Sorry to tell everyone that we may need to abort the release
> 2.10.4-candidate-1 because some CI can not be passed after #19674 [0] is
> cherry-picked.
> I will be sure to carry out the release process again as soon as it is
> resolved.
>
> Sincerely
> Xiangying
> [0] https://github.com/apache/pulsar/pull/19674
>
>
> On Sat, Mar 4, 2023 at 12:06 PM Xiangying Meng 
> wrote:
>
>> This is the third release candidate for Apache Pulsar, version 2.10.4.
>>
>> This release contains 99 commits by 34 contributors.
>> https://github.com/apache/pulsar/compare/v2.10.3...v2.10.4-candidate-1
>>
>> *** Please download, test, and vote on this release. This vote will stay
>> open
>> for at least 72 hours ***
>>
>> Note that we are voting upon the source (tag), binaries are provided for
>> convenience.
>>
>> Source and binary files:
>> https://dist.apache.org/repos/dist/dev/pulsar/pulsar-2.10.4-candidate-1/
>>
>> SHA-512 checksums:
>> 8cae74a5b586ab2378c2b2737c59507180af4b8efab4a99bc0dae233096036f5b18ab94255bea03e416d8d21958bedf684c8d4bd3982f458a547d3e1efa0f19f
>>  apache-pulsar-2.10.4-bin.tar.gz
>> 74e16c61ff6ae9e2a51e7ae24981598c71dabbff09c820bff9303c031882e1f15d029d06b6b5b6e4cc9a02b8957a102338ce09173c8744a59e5bd848b48b1d2a
>>  apache-pulsar-2.10.4-src.tar.gz
>>
>> Maven staging repo:
>> https://repository.apache.org/content/repositories/orgapachepulsar-1210/
>>
>> The tag to be voted upon:
>> v2.10.4-candidate-1 (d1aebd3e4c9503406845fb2e746a289e88e00fb2)
>> https://github.com/apache/pulsar/releases/tag/v2.10.4-candidate-1
>>
>> Pulsar's KEYS file containing PGP keys you use to sign the release:
>> https://downloads.apache.org/pulsar/KEYS
>>
>> Docker images:
>>
>> 
>>
>> https://hub.docker.com/layers/xiangyingmeng/pulsar/2.10.4/images/sha256-144d0380592a7e0578772eb2fa51da7cad70f1d5f8a2b46189669b15f0e6b4b6?context=repo
>>
>> 
>>
>> https://hub.docker.com/layers/xiangyingmeng/pulsar-all/2.10.4/images/sha256-bcf03c05be93ced24991afbcca13f4a4b5f183d9a7b877ae84e992e16ca599ee?context=repo
>>
>> Please download the source package, and follow the README to build
>> and run the Pulsar standalone service.
>>
>


Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

2023-03-06 Thread SiNan Liu
Great to see your comment, bo!

1. The first way. The protobuf website has a description of the rules, but
no plans to implement them.
https://protobuf.dev/programming-guides/proto/#updating

2. I think this PIP can be divided into two parts.
(1) Add a flag(`ValidatorClassName`), load it into
`ProtobufNativeSchemaCompatibilityCheck` when the broker starts.
ValidatorClassName is empty by default, and the implementation continues as
before, with no change for the user.
```java
ProtobufNativeSchemaValidator DEFAULT = (fromDescriptors, toDescriptor)
-> {
for (Descriptors.Descriptor fromDescriptor : fromDescriptors) {
// The default implementation only checks if the root message
has changed.
if
(!fromDescriptor.getFullName().equals(toDescriptor.getFullName())) {
throw new ProtoBufCanReadCheckException("Protobuf root
message isn't allow change!");
}
}
};
```
`ValidatorClassName` value also can be set to the current implementation of
PIP add
`org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl`.

(2) Recoding the `ProtobufNativeSchemaCompatibilityCheck`. Through the flag
(`ValidatorClassName`) to build different `ProtobufNativeSchemaValidator`.
Isn't it just a plug-in? The user can develop and choose a different
`ProtobufNativeSchemaValidator`. I think it didn't change the logic, it
just allowed him to expand it.


I think this PIP should be an enhancement and supplement to the function,
and there is no such thing as unnecessary and meaningless.


Thanks,
sinan





丛搏  于2023年3月7日周二 11:53写道:

> I think we have two ways to do that.
>
> First way: We need to advance the improvement of java in protobuf. Ask
> if they have plans to improve.
>
> Second way: the new PROTOBUF_NATIVE `SchemaCompatibilityCheck` should
> be implemented as a plugin, don't change any existing plugin logic
> (it's simple and already used). I don't recommend adding flags for
> rollback, it adds configuration and makes little sense.
>
> Thanks,
> Bo
>
> Asaf Mesika  于2023年3月6日周一 23:00写道:
>
> >
> > Can you convert the code block which is actually a quote in the
> > beginning of the PIP to something which doesn't require to scroll
> > horizontally so much?
> > Use
> >
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
> >
> > Let's improve the clarity of what you wrote:
> >
> > "the PROTOBUF uses avro struct to store."
> > -->
> > When Schema type PROTOBUF is used, Pulsar Client assumes the object given
> > to it as message data is an auto-generated POJO containing the
> annotations
> > encoding the schema. The client is using a converter, which converts a
> > Protobuf schema descriptor into an Avro schema and sends that as the
> Schema
> > of the producer/consumer.
> >
> > "On the broker side, protobuf and avro both use SchemaData converted to
> > org.apache.avro.Schema."
> > -->
> > Since the schema is an Avro schema, the implementation of compatibility
> > check on the broker side is to simply re-use the compatibility check of
> the
> > AVRO schema type.
> >
> > "ProtobufSchema is different from ProtobufNativeSchema in schema
> > compatibility check it uses avro-protobuf.
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > But the current implementation of ProtobufNative schema compatibility
> > check only
> > checked if the root message name is changed."
> >
> > -->
> > PROTOBUF_NATIVE schema type is different.
> > The client is actually using Protobuf Descriptor as the schema, as
> opposed
> > to Avro schema of PROTOBUF schema type. In the broker, the
> PROTOBUF_NATIVE
> > compatibility check actually hasn't implemented any rule, besides one:
> > checking if the root message name has changed.
> >
> >
> >
> > >1. For now, there is no official or third-party solution for
> ProtoBuf
> > >compatibility. If in the future have better solutions of a third
> party or
> > >the official, we develop new ProtobufNativeSchemaValidator and use,
> so
> > >add a flag.
> > >
> > > Who do you need to make that configurable? Once you found a third
> party,
> > just switch to it? Who knows, maybe you never will. Introduce it when you
> > find it, not now.
> >
> >
> > We improve in ProtobufNativeSchemaCompatibilityCheck BACKWARD, FORWARD
> > > these strategies. As with the AVRO implementation, protobuf
> compatibility
> > > checking need implementing the canRead method. *This will check that
> > > the writtenschema can be read by readSchema.*
> >
> >
> > I completely disagree.
> > Avro implementation is confusing for our use case. Don't copy that.
> >
> > You have
> >
> > public void checkCompatible(SchemaData from, SchemaData to,
> > SchemaCompatibilityStrategy strategy)
> > throws IncompatibleSchemaException {
> > Descriptor fromDescriptor =
> > 

Re: [DISCUSS] PIP-253: Expose producer metrics for deadLetterProducer and retryLetterProducer

2023-03-06 Thread Kai Levy
Optional would work well, I don't believe that will make
implementation easier. The ConsumerStats class will need a way of
retrieving ProducerStats from the consumer at some point in the future,
which isn't possible right now, since deadLetterProducer and
retryLetterProducer are both private to ConsumerImpl.

Kai

On Mon, Mar 6, 2023 at 1:44 PM Michael Marshall 
wrote:

> I support exposing these stats, and I don't have a preference
> regarding where to put the method.
>
> > I believe the implementation will be harder that way
>
> Would it be sufficient to make the field return
> Optional or to return `null`? I don't think we have
> defined a convention stating we prefer to wrap fields with `Optional`,
> but I think it'd be a reasonable use of the wrapper and it'd make it
> clear to users that the value might not be present.
>
> Thanks,
> Michael
>
> On Mon, Mar 6, 2023 at 10:24 AM Kai Levy  wrote:
> >
> > I agree, adding it to the ConsumerStats interface makes more logical
> sense,
> > but I believe the implementation will be harder that way, since the
> > producers are lazily initialized. They won't be available when
> > ConsumerStats is created, and there isn't currently a way to access them
> > directly from the consumer.
> >
> > Kai
> >
> > On Sun, Mar 5, 2023 at 5:19 AM Asaf Mesika 
> wrote:
> >
> > > I would rather see them as attributes of ConsumerStats .
> > > Add
> > >
> > > ProducerStats deadLetterProducerStats;
> > >
> > > ProducerStats retryLetterProducerStats();
> > >
> > >
> > > On Fri, Mar 3, 2023 at 2:54 AM Kai Levy  wrote:
> > >
> > > > Hello!
> > > >
> > > > I created a new PIP because I discovered there's no way for a user to
> > > > access the metrics for a consumer's deadLetterProducer /
> > > > retryLetterProducer, since it is private to ConsumerImpl.java. I
> would
> > > like
> > > > to propose an API change that would expose those statistics. More
> details
> > > > on the github issue:
> > > > https://github.com/apache/pulsar/issues/19698
> > > >
> > > > Thanks!
> > > > Kai
> > > >
> > >
>
>


Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

2023-03-06 Thread SiNan Liu
Thanks for the advice, Asaf.

1.

> For now, there is no official or third-party solution for ProtoBuf
> compatibility. If in the future have better solutions of a third party or
> the official, we develop new ProtobufNativeSchemaValidator and use, so add
> a flag.

Flag defaults not set, where the schema Compatibility Checking Rule checks
only the name of the root message. If you want to use the current PIP
implementation, Can be set to
`org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl`.
This explains what the flag does, and I'm not going to delete it. The user
can choose whether to use the previous implementation (just check that the
root message name is the same), but this may not be enough, so you can
choose the current PIP implementation. If there is a better third party or
official solution in the future, it can be well developed and replaced. It
is necessary to add the flag to the PIP and keep the implementation
extensible.

2.

> Why not have a simple function for validation for each switch case above?
> Why do we need strategy and builder, and all this complexity?

I don't see how it's complicated. It's easy to understand and it's not
redundant. The only function of the validator is to check whether two
Protobufs are compatible. The builder builds checkers based on different
compatibility checking strategies. If all the implementation is done in the
validator, it will be messy and there will be a lot of duplication. And if
a new validator is extended later, it won't extend well. So discarding
encapsulation is more complex and loses scalability. I won't change this
design.

3. *Here are the basic compatibility rules we've defined:*
https://protobuf.dev/programming-guides/proto/#updating
According to the rules formulated by the official website, not customized
by myself.

4. Other parts have been updated with explanations.


Thanks,
sinan







丛搏  于2023年3月7日周二 11:53写道:

> I think we have two ways to do that.
>
> First way: We need to advance the improvement of java in protobuf. Ask
> if they have plans to improve.
>
> Second way: the new PROTOBUF_NATIVE `SchemaCompatibilityCheck` should
> be implemented as a plugin, don't change any existing plugin logic
> (it's simple and already used). I don't recommend adding flags for
> rollback, it adds configuration and makes little sense.
>
> Thanks,
> Bo
>
> Asaf Mesika  于2023年3月6日周一 23:00写道:
>
> >
> > Can you convert the code block which is actually a quote in the
> > beginning of the PIP to something which doesn't require to scroll
> > horizontally so much?
> > Use
> >
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
> >
> > Let's improve the clarity of what you wrote:
> >
> > "the PROTOBUF uses avro struct to store."
> > -->
> > When Schema type PROTOBUF is used, Pulsar Client assumes the object given
> > to it as message data is an auto-generated POJO containing the
> annotations
> > encoding the schema. The client is using a converter, which converts a
> > Protobuf schema descriptor into an Avro schema and sends that as the
> Schema
> > of the producer/consumer.
> >
> > "On the broker side, protobuf and avro both use SchemaData converted to
> > org.apache.avro.Schema."
> > -->
> > Since the schema is an Avro schema, the implementation of compatibility
> > check on the broker side is to simply re-use the compatibility check of
> the
> > AVRO schema type.
> >
> > "ProtobufSchema is different from ProtobufNativeSchema in schema
> > compatibility check it uses avro-protobuf.
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > But the current implementation of ProtobufNative schema compatibility
> > check only
> > checked if the root message name is changed."
> >
> > -->
> > PROTOBUF_NATIVE schema type is different.
> > The client is actually using Protobuf Descriptor as the schema, as
> opposed
> > to Avro schema of PROTOBUF schema type. In the broker, the
> PROTOBUF_NATIVE
> > compatibility check actually hasn't implemented any rule, besides one:
> > checking if the root message name has changed.
> >
> >
> >
> > >1. For now, there is no official or third-party solution for
> ProtoBuf
> > >compatibility. If in the future have better solutions of a third
> party or
> > >the official, we develop new ProtobufNativeSchemaValidator and use,
> so
> > >add a flag.
> > >
> > > Who do you need to make that configurable? Once you found a third
> party,
> > just switch to it? Who knows, maybe you never will. Introduce it when you
> > find it, not now.
> >
> >
> > We improve in ProtobufNativeSchemaCompatibilityCheck BACKWARD, FORWARD
> > > these strategies. As with the AVRO implementation, protobuf
> compatibility
> > > checking need implementing the canRead method. *This will check that
> > > the writtenschema can be read by readSchema.*
> >
> >
> > I 

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

2023-03-06 Thread 丛搏
I think we have two ways to do that.

First way: We need to advance the improvement of java in protobuf. Ask
if they have plans to improve.

Second way: the new PROTOBUF_NATIVE `SchemaCompatibilityCheck` should
be implemented as a plugin, don't change any existing plugin logic
(it's simple and already used). I don't recommend adding flags for
rollback, it adds configuration and makes little sense.

Thanks,
Bo

Asaf Mesika  于2023年3月6日周一 23:00写道:

>
> Can you convert the code block which is actually a quote in the
> beginning of the PIP to something which doesn't require to scroll
> horizontally so much?
> Use
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
>
> Let's improve the clarity of what you wrote:
>
> "the PROTOBUF uses avro struct to store."
> -->
> When Schema type PROTOBUF is used, Pulsar Client assumes the object given
> to it as message data is an auto-generated POJO containing the annotations
> encoding the schema. The client is using a converter, which converts a
> Protobuf schema descriptor into an Avro schema and sends that as the Schema
> of the producer/consumer.
>
> "On the broker side, protobuf and avro both use SchemaData converted to
> org.apache.avro.Schema."
> -->
> Since the schema is an Avro schema, the implementation of compatibility
> check on the broker side is to simply re-use the compatibility check of the
> AVRO schema type.
>
> "ProtobufSchema is different from ProtobufNativeSchema in schema
> compatibility check it uses avro-protobuf.
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> But the current implementation of ProtobufNative schema compatibility
> check only
> checked if the root message name is changed."
>
> -->
> PROTOBUF_NATIVE schema type is different.
> The client is actually using Protobuf Descriptor as the schema, as opposed
> to Avro schema of PROTOBUF schema type. In the broker, the PROTOBUF_NATIVE
> compatibility check actually hasn't implemented any rule, besides one:
> checking if the root message name has changed.
>
>
>
> >1. For now, there is no official or third-party solution for ProtoBuf
> >compatibility. If in the future have better solutions of a third party or
> >the official, we develop new ProtobufNativeSchemaValidator and use, so
> >add a flag.
> >
> > Who do you need to make that configurable? Once you found a third party,
> just switch to it? Who knows, maybe you never will. Introduce it when you
> find it, not now.
>
>
> We improve in ProtobufNativeSchemaCompatibilityCheck BACKWARD, FORWARD
> > these strategies. As with the AVRO implementation, protobuf compatibility
> > checking need implementing the canRead method. *This will check that
> > the writtenschema can be read by readSchema.*
>
>
> I completely disagree.
> Avro implementation is confusing for our use case. Don't copy that.
>
> You have
>
> public void checkCompatible(SchemaData from, SchemaData to,
> SchemaCompatibilityStrategy strategy)
> throws IncompatibleSchemaException {
> Descriptor fromDescriptor =
> ProtobufNativeSchemaUtils.deserialize(from.getData());
> Descriptor toDescriptor =
> ProtobufNativeSchemaUtils.deserialize(to.getData());
> switch (strategy) {
> case BACKWARD_TRANSITIVE:
> case BACKWARD:
> case FORWARD_TRANSITIVE:
> case FORWARD:
> case FULL_TRANSITIVE:
> case FULL:
> checkRootMessageChange(fromDescriptor, toDescriptor, strategy);
> return;
> case ALWAYS_COMPATIBLE:
> return;
> default:
> throw new IncompatibleSchemaException("Unknown
> SchemaCompatibilityStrategy.");
> }
> }
>
> I would rename :
> from --> currentSchema
> to --> newSchema
>
> Use that switch case and have a method for each like:
> validateBackwardsCompatibility(currentSchema, newSchema)
>
> I dislike canRead and usage of writtenSchema, since you have two completely
> different use cases: from the producing side and the consumer side.
>
> schemaValidatorBuilder
> >
> > I dislike this proposal. IMO Avro implementation is way too complicated.
> Why not have a simple function for validation for each switch case above?
> Why do we need strategy and builder, and all this complexity?
>
>
> *Here are the basic compatibility rules we've defined:*
>
>
> IMO it's impossible to read the validation rules as you described them.
> I wrote how they should be structured numerous times above.
> I can't validate them.
>
>
> IMO, the current design is very hard to read.
> Please try to avoid jumping into code sections.
> Write a high level design section, in which you describe in words what you
> plan to do.
> Write the validation rules in the structure that is easy to understand:
> rules per each compatibility check, and use proper words (current schema,
> new schema), since new schema can be once used for read 

Re: [Discussion] Allowing configure if function consumer should skip to latest

2023-03-06 Thread Neng Lu
Hi Penghui,

Thanks for your question.

One case is failure recovery for a windowing function.

A windowing function will ack message until its window is emitted. If the 
window function fails due to issues such as OOM and restarts, it has a massive 
backlog to catch up. And the function will never be able to recover itself 
since the backlog keeps growing and it keeps OOM.

Our user prefers an automatic way for recovery, given they are okay with 
skipping some backlog data. (This is acceptable in IoT cases). Also, Users may 
deploy hundreds of functions in their environment. Manually resetting the 
cursor is not scalable and is a heavy burden for the on-call person in such 
cases. 

Hope the above use case can help provide some more context regarding the change.

On 2023/03/03 03:51:35 PengHui Li wrote:
> Hi Neng,
> 
> Thanks for raising up the discussion
> 
> > In certain failure cases, the function needs to skip all the content
> between the last successfully acked message and the latest message in the
> topic in order to skip the huge backlog and quick recovery.
> 
> Do you have some real cases that can help us to understand it
> is necessary to introduce a new flag? Another possibility is users
> can use pulsar admin to reset the cursor to the latest position,
> Why will it not work for users? 
> 
> Regards,
> Penghui
> 
> > On Mar 1, 2023, at 10:16, Neng Lu  wrote:
> > 
> > In certain failure cases, the function needs to skip all the content
> > between the last successfully acked message and the latest message in the
> > topic in order to skip the huge backlog and quick recovery.
> 
> 


Re: [Vote] PIP-245: Make subscriptions of non-persistent topic non-durable

2023-03-06 Thread 丛搏
+1 (binding)

Thanks,
Bo

guo jiwei  于2023年3月6日周一 19:10写道:
>
> +1 (binding)
>
> Regards
> Jiwei Guo (Tboy)
>
> On Mon, Mar 6, 2023 at 9:59 AM Yunze Xu  wrote:
> >
> > +1 (binding)
> >
> > Thanks,
> > Yunze
> >
> > On Fri, Mar 3, 2023 at 11:46 AM PengHui Li  wrote:
> > >
> > > +1 (binding)
> > >
> > > Penghui
> > >
> > > > On Feb 13, 2023, at 14:56, Jiuming Tao  
> > > > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > I would like to start a VOTE on `PIP-245: Make subscriptions of 
> > > > non-persistent topic non-durable`.
> > > >
> > > > Motivation:
> > > >
> > > > There are two types of subscriptions for a topic: Durable and 
> > > > Non-durable.
> > > >
> > > > We create a Consumer with a Durable subscription and a Reader with a 
> > > > Non-durable subscription.
> > > >
> > > > But for NonPersistentTopic, creating a Durable subscription is 
> > > > meaningless, NonPersistentSubscription doesn't have a ManagedCursor to 
> > > > persistent its data. After its consumer disconnected, the subscription 
> > > > couldn't be removed automatically if we didn't set the value of 
> > > > subscriptionExpirationTimeMinutes greater than 0.
> > > >
> > > > For subscriptionExpirationTimeMinutes, it controls the subscription 
> > > > expiration of NonPersistentTopic and PersistentTopic, if we set the 
> > > > value of subscriptionExpirationTimeMinutes greater than 0, it may lead 
> > > > to data loss(The durable subscriptions of PersistentTopic also can be 
> > > > removed).
> > > >
> > > > And the Non-durable subscriptions will be removed automatically after 
> > > > all the consumers disconnected, it's the existing logic.
> > > >
> > > > For the purpose of removing the subscriptions which have no active 
> > > > consumers of NonPersistentTopic and the above reasons, we can make all 
> > > > the subscriptions of a NonPersistentTopic Non-durable.
> > > >
> > > >
> > > >
> > > > For more details, you can read: 
> > > > https://github.com/apache/pulsar/issues/19448 
> > > > 
> > > >
> > > > And the discuss thread is available at: 
> > > > https://lists.apache.org/thread/2ltmyglnb25jy8nk58twkwbglws43bst 
> > > > 
> > > >
> > > > Thanks,
> > > > Tao Jiuming
> > >


Re: [DISCUSS] new idea: reverse reading a topic

2023-03-06 Thread Yong Zhang
Hi Kannar,

Just interested in what exactly your case.

Why do you need to read messages in a reversed order? What is your case?

Best,
Yong

On Mon, 6 Mar 2023 at 23:37, Alexandre DUVAL  wrote:

> Hi,
>
> I'm wondering if it is possible to introduce a new feature on Pulsar
> which will enable users to read topic from a defined MessageId to
> previous messages until the begin of the topic.
>
> I tried to use Pulsar SQL but it requires so much RAM even for little
> queries (due to Presto design).
>
> Currently, every read in Pulsar are expected to be going forward. So it
> might be a bit tricky to prevent every weird behavior by introducing the
> feature.
>
> I'm currently tried to make an MVP/POC by introducting a readReverse
> field in the CommandSubscribe that is used by ReaderAPI and currently
> looking for to create a getFirstMessageId() on ManagedLedger
> (https://github.com/CleverCloud/pulsar/pull/3). I also removed
> startPosition < endPosition sanity checks in BookKeeper locally
> (https://github.com/CleverCloud/bookkeeper/pull/2).
>
> We definitely prefer a readPrevious(), hasPreviousMessageAvailable() in
> the ReaderAPI.
>
> I'm not familiar with these internals such as NonDurableCursor,
> RangeEntryCache, ManagedCursor so it's a bit tricky.
>
> So I wondering someone to help/guide me or even directly handle the
> subject (or the discuss).
>
> Regards,
>
> Kannar
>
>
>


Re: [DISCUSS] PMC/Committer Emiratus status

2023-03-06 Thread Asaf Mesika
Tison,

The suggestion was stated a bit differently:

Quote:

Rather, I would recommend a project-level “active/inactive” flag that PMC
members can voluntarily apply to themselves. For example, do a PMC roll
call on private@pulsar.a.o  and ask whether
current PMC members self describe as “active” or “inactive”. That status
could then be reflected on the Community section of the pulsar website. No
reply from a PMC member? Mark them inactive until they request a change.

End Quote


On Mon, Mar 6, 2023 at 6:13 PM tison  wrote:

> I won't object if any PMC member or committer is willing to set their
> personal status; e.g., open to topics of a specific domain, "inactive"
> or "don't
> disturb"
>
> Best,
> tison.
>
>
> Asaf Mesika  于2023年3月7日周二 00:05写道:
>
> > Do other think it's a good thing to adopt P. Taylor Goetz idea of active
> > flag and the process suggested?
> >
> >
> > On Mon, Mar 6, 2023 at 2:03 PM tison  wrote:
> >
> > > Hi Yu,
> > >
> > > You can start by adding a page on the contribution guide or a table in
> > the
> > > README of the main repo/site repo to state that you're an expert in the
> > > document domain.
> > >
> > > I have an in-house landscape about Pulsar modules, the broader
> ecosystem,
> > > and their active contributors/maintainers. Since people may not want to
> > be
> > > referred to publicly by another person, I don't make it public.
> > >
> > > Finding experts can be a skill to collaborate in a community. You can
> > find
> > > them when browsing relative PRs, analyzing the commit history, meeting
> > them
> > > in the community, and having conversations. I don't have the motivation
> > to
> > > maintain such a table publicly.
> > >
> > > Scala has a table of domain experts that can help[1]. If you like it,
> you
> > > can list yourself and try to bring other experts to list themselves.
> The
> > > ASF shares a sense that "The Foundation belongs to *you*". You're
> > already a
> > > PMC member and able to drive such an effort.
> > >
> > > Best,
> > > tison.
> > >
> > > [1] https://github.com/scala/scala#get-in-touch
> > >
> > >
> > > Yu  于2023年3月6日周一 19:48写道:
> > >
> > > > Hi Asaf,
> > > >
> > > > Thanks for bringing this up!
> > > >
> > > > If I may put my two pennies' worth:
> > > >
> > > > To be honest, this idea flashed across my mind previously. I talked
> > about
> > > > this to my colleague, and he was surprised that I was willing to be
> > > > deprived of benefits (at that time, I was a PMC member already).
> > > >
> > > > PMC members are vital promotors and driving forces of a community.
> > > Ideally,
> > > > they should be direction leaders and make great contributions
> > > > *continuously*. No one should enjoy the benefits of honor but not
> > > > contribute much *all the time*. Setting retirement bars for PMC
> members
> > > > reminds us to contribute and provide value. Maybe I'm a little
> > aggressive
> > > > :-)
> > > >
> > > > ~~
> > > >
> > > > +1 but a long list of PMC members with many inactive members does not
> > > > create a good feeling since "false prosperity" is no better than
> "real
> > > > contributions".
> > > >
> > > > > 3. Merit doesn’t expire.
> > > >
> > > > ~~
> > > >
> > > > Compared to my previous thought, Goetz has proposed a better idea
> > since:
> > > >
> > > > 1. It's mild and can be accepted by many PMC members. A kind of life
> > > wisdom
> > > > :-)
> > > >
> > > > 2. People who need help (e.g., PIP approvals / PR comments / ...)
> from
> > > PMC
> > > > members can check the flags to know who is available to help.
> > > >
> > > > Except for flags, I suggest adding "area of expertise" for PMC
> members
> > > and
> > > > committers, so people will know who are the most suitable experts to
> > ask
> > > > for help or collaborate.
> > > >
> > > > > 1. You can maintain active/inactive status at the project level
> with
> > a
> > > > simple flag on a community page, without removing people from the
> PMC.
> > > > > 2. By making it an informal, self-reported flag, you avoid the
> > overhead
> > > > of board resolutions, etc. and just manage it at the community level.
> > If
> > > > someone wants to change their status, they can just say so or submit
> a
> > > pull
> > > > request to change their status on the pulsar website.
> > > >
> > > > ~~
> > > >
> > > > Yu
> > > >
> > > > On Sun, Mar 5, 2023 at 10:31 PM Asaf Mesika 
> > > wrote:
> > > >
> > > > > Thanks to everyone who took the time to carefully answer with
> > detailed
> > > > > explanations.
> > > > > I personally learned a lot about Apache projects this way (made me
> > read
> > > > > about it some more).
> > > > >
> > > > > So my personal recap is:
> > > > >
> > > > >- The goal of knowing the health of the Apache Pulsar community
> > can
> > > be
> > > > >achieved by taking a look at monthly active contributors over
> time
> > > > >displayed on the community page.
> > > > >   - It could be nice getting those numbers on the 

Re: [DISCUSS] PIP-253: Expose producer metrics for deadLetterProducer and retryLetterProducer

2023-03-06 Thread Michael Marshall
I support exposing these stats, and I don't have a preference
regarding where to put the method.

> I believe the implementation will be harder that way

Would it be sufficient to make the field return
Optional or to return `null`? I don't think we have
defined a convention stating we prefer to wrap fields with `Optional`,
but I think it'd be a reasonable use of the wrapper and it'd make it
clear to users that the value might not be present.

Thanks,
Michael

On Mon, Mar 6, 2023 at 10:24 AM Kai Levy  wrote:
>
> I agree, adding it to the ConsumerStats interface makes more logical sense,
> but I believe the implementation will be harder that way, since the
> producers are lazily initialized. They won't be available when
> ConsumerStats is created, and there isn't currently a way to access them
> directly from the consumer.
>
> Kai
>
> On Sun, Mar 5, 2023 at 5:19 AM Asaf Mesika  wrote:
>
> > I would rather see them as attributes of ConsumerStats .
> > Add
> >
> > ProducerStats deadLetterProducerStats;
> >
> > ProducerStats retryLetterProducerStats();
> >
> >
> > On Fri, Mar 3, 2023 at 2:54 AM Kai Levy  wrote:
> >
> > > Hello!
> > >
> > > I created a new PIP because I discovered there's no way for a user to
> > > access the metrics for a consumer's deadLetterProducer /
> > > retryLetterProducer, since it is private to ConsumerImpl.java. I would
> > like
> > > to propose an API change that would expose those statistics. More details
> > > on the github issue:
> > > https://github.com/apache/pulsar/issues/19698
> > >
> > > Thanks!
> > > Kai
> > >
> >


Re: [DISCUSS] PIP-253: Expose producer metrics for deadLetterProducer and retryLetterProducer

2023-03-06 Thread Kai Levy
I agree, adding it to the ConsumerStats interface makes more logical sense,
but I believe the implementation will be harder that way, since the
producers are lazily initialized. They won't be available when
ConsumerStats is created, and there isn't currently a way to access them
directly from the consumer.

Kai

On Sun, Mar 5, 2023 at 5:19 AM Asaf Mesika  wrote:

> I would rather see them as attributes of ConsumerStats .
> Add
>
> ProducerStats deadLetterProducerStats;
>
> ProducerStats retryLetterProducerStats();
>
>
> On Fri, Mar 3, 2023 at 2:54 AM Kai Levy  wrote:
>
> > Hello!
> >
> > I created a new PIP because I discovered there's no way for a user to
> > access the metrics for a consumer's deadLetterProducer /
> > retryLetterProducer, since it is private to ConsumerImpl.java. I would
> like
> > to propose an API change that would expose those statistics. More details
> > on the github issue:
> > https://github.com/apache/pulsar/issues/19698
> >
> > Thanks!
> > Kai
> >
>


Re: [DISCUSS] PMC/Committer Emiratus status

2023-03-06 Thread tison
I won't object if any PMC member or committer is willing to set their
personal status; e.g., open to topics of a specific domain, "inactive"
or "don't
disturb"

Best,
tison.


Asaf Mesika  于2023年3月7日周二 00:05写道:

> Do other think it's a good thing to adopt P. Taylor Goetz idea of active
> flag and the process suggested?
>
>
> On Mon, Mar 6, 2023 at 2:03 PM tison  wrote:
>
> > Hi Yu,
> >
> > You can start by adding a page on the contribution guide or a table in
> the
> > README of the main repo/site repo to state that you're an expert in the
> > document domain.
> >
> > I have an in-house landscape about Pulsar modules, the broader ecosystem,
> > and their active contributors/maintainers. Since people may not want to
> be
> > referred to publicly by another person, I don't make it public.
> >
> > Finding experts can be a skill to collaborate in a community. You can
> find
> > them when browsing relative PRs, analyzing the commit history, meeting
> them
> > in the community, and having conversations. I don't have the motivation
> to
> > maintain such a table publicly.
> >
> > Scala has a table of domain experts that can help[1]. If you like it, you
> > can list yourself and try to bring other experts to list themselves. The
> > ASF shares a sense that "The Foundation belongs to *you*". You're
> already a
> > PMC member and able to drive such an effort.
> >
> > Best,
> > tison.
> >
> > [1] https://github.com/scala/scala#get-in-touch
> >
> >
> > Yu  于2023年3月6日周一 19:48写道:
> >
> > > Hi Asaf,
> > >
> > > Thanks for bringing this up!
> > >
> > > If I may put my two pennies' worth:
> > >
> > > To be honest, this idea flashed across my mind previously. I talked
> about
> > > this to my colleague, and he was surprised that I was willing to be
> > > deprived of benefits (at that time, I was a PMC member already).
> > >
> > > PMC members are vital promotors and driving forces of a community.
> > Ideally,
> > > they should be direction leaders and make great contributions
> > > *continuously*. No one should enjoy the benefits of honor but not
> > > contribute much *all the time*. Setting retirement bars for PMC members
> > > reminds us to contribute and provide value. Maybe I'm a little
> aggressive
> > > :-)
> > >
> > > ~~
> > >
> > > +1 but a long list of PMC members with many inactive members does not
> > > create a good feeling since "false prosperity" is no better than "real
> > > contributions".
> > >
> > > > 3. Merit doesn’t expire.
> > >
> > > ~~
> > >
> > > Compared to my previous thought, Goetz has proposed a better idea
> since:
> > >
> > > 1. It's mild and can be accepted by many PMC members. A kind of life
> > wisdom
> > > :-)
> > >
> > > 2. People who need help (e.g., PIP approvals / PR comments / ...) from
> > PMC
> > > members can check the flags to know who is available to help.
> > >
> > > Except for flags, I suggest adding "area of expertise" for PMC members
> > and
> > > committers, so people will know who are the most suitable experts to
> ask
> > > for help or collaborate.
> > >
> > > > 1. You can maintain active/inactive status at the project level with
> a
> > > simple flag on a community page, without removing people from the PMC.
> > > > 2. By making it an informal, self-reported flag, you avoid the
> overhead
> > > of board resolutions, etc. and just manage it at the community level.
> If
> > > someone wants to change their status, they can just say so or submit a
> > pull
> > > request to change their status on the pulsar website.
> > >
> > > ~~
> > >
> > > Yu
> > >
> > > On Sun, Mar 5, 2023 at 10:31 PM Asaf Mesika 
> > wrote:
> > >
> > > > Thanks to everyone who took the time to carefully answer with
> detailed
> > > > explanations.
> > > > I personally learned a lot about Apache projects this way (made me
> read
> > > > about it some more).
> > > >
> > > > So my personal recap is:
> > > >
> > > >- The goal of knowing the health of the Apache Pulsar community
> can
> > be
> > > >achieved by taking a look at monthly active contributors over time
> > > >displayed on the community page.
> > > >   - It could be nice getting those numbers on the mailing list
> > itself
> > > >   as well.
> > > >- Calculating the engagement is not an easy task.
> > > >- Kicking people off is not something you'd like to do in general
> > and
> > > >specifically for volunteers.
> > > >- People's credit for work, which is also expressed in PMC
> > membership
> > > >never expires due to Merit never expires - your work credit and
> > earned
> > > >right should not expire.
> > > >
> > > >
> > > > I personally see PMC members answering someone not a PMC member nor a
> > > > comitter on this topic as a very healthy community indicator :)
> > > >
> > > > Thanks !
> > > >
> > > > Asaf
> > > >
> > > > On Fri, Mar 3, 2023 at 10:22 AM Enrico Olivelli  >
> > > > wrote:
> > > >
> > > > > This is an interesting discussion.
> > > > > Good to see this kind of a discussion on 

Re: [DISCUSS] PMC/Committer Emiratus status

2023-03-06 Thread Asaf Mesika
Do other think it's a good thing to adopt P. Taylor Goetz idea of active
flag and the process suggested?


On Mon, Mar 6, 2023 at 2:03 PM tison  wrote:

> Hi Yu,
>
> You can start by adding a page on the contribution guide or a table in the
> README of the main repo/site repo to state that you're an expert in the
> document domain.
>
> I have an in-house landscape about Pulsar modules, the broader ecosystem,
> and their active contributors/maintainers. Since people may not want to be
> referred to publicly by another person, I don't make it public.
>
> Finding experts can be a skill to collaborate in a community. You can find
> them when browsing relative PRs, analyzing the commit history, meeting them
> in the community, and having conversations. I don't have the motivation to
> maintain such a table publicly.
>
> Scala has a table of domain experts that can help[1]. If you like it, you
> can list yourself and try to bring other experts to list themselves. The
> ASF shares a sense that "The Foundation belongs to *you*". You're already a
> PMC member and able to drive such an effort.
>
> Best,
> tison.
>
> [1] https://github.com/scala/scala#get-in-touch
>
>
> Yu  于2023年3月6日周一 19:48写道:
>
> > Hi Asaf,
> >
> > Thanks for bringing this up!
> >
> > If I may put my two pennies' worth:
> >
> > To be honest, this idea flashed across my mind previously. I talked about
> > this to my colleague, and he was surprised that I was willing to be
> > deprived of benefits (at that time, I was a PMC member already).
> >
> > PMC members are vital promotors and driving forces of a community.
> Ideally,
> > they should be direction leaders and make great contributions
> > *continuously*. No one should enjoy the benefits of honor but not
> > contribute much *all the time*. Setting retirement bars for PMC members
> > reminds us to contribute and provide value. Maybe I'm a little aggressive
> > :-)
> >
> > ~~
> >
> > +1 but a long list of PMC members with many inactive members does not
> > create a good feeling since "false prosperity" is no better than "real
> > contributions".
> >
> > > 3. Merit doesn’t expire.
> >
> > ~~
> >
> > Compared to my previous thought, Goetz has proposed a better idea since:
> >
> > 1. It's mild and can be accepted by many PMC members. A kind of life
> wisdom
> > :-)
> >
> > 2. People who need help (e.g., PIP approvals / PR comments / ...) from
> PMC
> > members can check the flags to know who is available to help.
> >
> > Except for flags, I suggest adding "area of expertise" for PMC members
> and
> > committers, so people will know who are the most suitable experts to ask
> > for help or collaborate.
> >
> > > 1. You can maintain active/inactive status at the project level with a
> > simple flag on a community page, without removing people from the PMC.
> > > 2. By making it an informal, self-reported flag, you avoid the overhead
> > of board resolutions, etc. and just manage it at the community level. If
> > someone wants to change their status, they can just say so or submit a
> pull
> > request to change their status on the pulsar website.
> >
> > ~~
> >
> > Yu
> >
> > On Sun, Mar 5, 2023 at 10:31 PM Asaf Mesika 
> wrote:
> >
> > > Thanks to everyone who took the time to carefully answer with detailed
> > > explanations.
> > > I personally learned a lot about Apache projects this way (made me read
> > > about it some more).
> > >
> > > So my personal recap is:
> > >
> > >- The goal of knowing the health of the Apache Pulsar community can
> be
> > >achieved by taking a look at monthly active contributors over time
> > >displayed on the community page.
> > >   - It could be nice getting those numbers on the mailing list
> itself
> > >   as well.
> > >- Calculating the engagement is not an easy task.
> > >- Kicking people off is not something you'd like to do in general
> and
> > >specifically for volunteers.
> > >- People's credit for work, which is also expressed in PMC
> membership
> > >never expires due to Merit never expires - your work credit and
> earned
> > >right should not expire.
> > >
> > >
> > > I personally see PMC members answering someone not a PMC member nor a
> > > comitter on this topic as a very healthy community indicator :)
> > >
> > > Thanks !
> > >
> > > Asaf
> > >
> > > On Fri, Mar 3, 2023 at 10:22 AM Enrico Olivelli 
> > > wrote:
> > >
> > > > This is an interesting discussion.
> > > > Good to see this kind of a discussion on the dev@ mailing list, this
> > > > way more people are aware of the fact that we are a project in the
> ASF
> > > > and there is a Project Management Committee.
> > > >
> > > > I have been following a few Apache projects for a while, and I
> believe
> > > > that this kind of discussions should be run on the private@ mailing
> > > > list.
> > > > It is the PMC that usually deals with this stuff.
> > > >
> > > > As Tison said, the common practice is that you never remove anyone
> > > > from a PMC 

Re: [Discuss] PIP-248: Add backlog eviction metric

2023-03-06 Thread Asaf Mesika
>
> Pulsar has a feature called backlog quota (place link).

You need to place a link :)

Expose pulsar_storage_backlog_quota_count in the topic leve

You already have "pulsar_storage_backlog_size", so why do you need this
metric for?

backlogQuotaLimitSize

should be `backlogQuotaSizeBytes`

backlogQuotaLimitTime

should be `backlogQuotaTimeSeconds`

What about goal no.4? Expose oldest unacknowledged message subscription
name?

IMO, metrics are like API - perhaps indicate the change there as well

Record the event when dropBacklogForSizeLimit
> 
>  or dropBacklogForTimeLimit
> 
>  is
> going to invoked.


Oh, now I get it.
So you need to rename the metric.
"pulsar_storage_backlog_quota_count" -->
`pulsar_storage_backlog_eviction_count`


> the topic's existing subscription.

"subscription" --> "subscription*s*"

Number of backlog quota happends.

Number of times backlog evictions happened due to exceeding backlog quota
(either time or size).


>1. Find the backlog subscriptions
>After received the alarm, users could request Topics#getStats(topicName,
>true/false, true, true)
>
> 
>  to
>get the topic stats, and find which subscriptions are in backlog.
>Pulsar exposed backlogSize and earliestMsgPublishTimeInBacklog in the
>subscription level, and we will expose backlogQuotaLimitSize and
>backlogQuotaLimitTime in the topic level, so users could find which
>subscriptions in backlog easily.
>
> I wrote how it should be done IMO in a previous email.


On Mon, Mar 6, 2023 at 1:20 PM 太上玄元道君  wrote:

> Hi Aasf,
> I've updated the PIP, PTAL
>
> Thanks,
> Tao Jiuming
>
> Asaf Mesika  于2023年3月5日周日 21:00写道:
>
> > On Thu, Mar 2, 2023 at 12:57 PM 太上玄元道君  wrote:
> >
> > > > I  think you should fix this explanation:
> > >
> > > Thanks! I would like to copy the context you provide to the PIP
> > motivation,
> > > your description is more detailed, so developers don't have to go
> through
> > > the code.
> > >
> >
> > Sure
> >
> >
> > >
> > > > Today the quota is checked periodically, right? So that's how the
> > > operator
> > > > knows the cost in terms of I/O is limited.
> > > > Now you are adding one additional I/O per collection, every 1 min by
> > > > default. That's a lot perhaps. How long is the check interval today?
> > >
> > > Actually, I don't want to introduce additional costs, I thought we
> > > could cache its result, so that it won't introduce additional costs.
> > > It may be that I did not make it clear in the PIP and caused this
> > > misunderstanding, sorry.
> > >
> >
> > Ok, just to verify: You plan to modify the code that runs periodically
> the
> > backlog quota check, so the result will be cached there? This way when
> you
> > pull that information from that code every 1min to expose it as a metric
> it
> > will have 0 I/O cost?
> >
> >
> >
> > >
> > > > The user today can calculate quota used for size based limit, since
> > there
> > > > are two metrics that are exposed today on a topic level: "
> > > > pulsar_storage_backlog_quota_limit" and
> "pulsar_storage_backlog_size".
> > > You
> > > > can just divide the two to get a percentage.
> > > > For the time-based limit, the only metric exposed today is quota
> > itself ,
> > > "
> > > > pulsar_storage_backlog_quota_limit_time".
> > >
> > > I only noticed `pulsar_storage_backlog_size` but missed
> > > `pulsar_storage_backlog_quota_limit` and
> > > `pulsar_storage_backlog_quota_limit_time`. Many thanks for your
> reminder.
> > >
> > >
> > > So, in this condition, we already have the following topic-level
> metrics:
> > > `pulsar_storage_backlog_size`: The total backlog size of the topics of
> > this
> > > topic owned by this broker (in bytes).
> > > `pulsar_storage_backlog_quota_limit`: The total amount of the data in
> > this
> > > topic that limits the backlog quota (bytes).
> > > `pulsar_storage_backlog_quota_limit_time`: The backlog quota limit in
> > > time(seconds). (This metric does not exists in the doc, need to
> improve)
> > >
> > >
> > > We just need to add a new metric named
> > > `pulsar_storage_earliest_msg_publish_time_in_backlog` in the
> topic-level
> > > that indicates the publish time of the earliest message in the backlog.
> > > So users could get `pulsar_backlog_size_quota_used_percentage` by
> divide
> > > `pulsar_storage_backlog_size ` and
> > > `pulsar_storage_backlog_quota_limit`(`pulsar_storage_backlog_size` /
> > > `pulsar_storage_backlog_quota_limit`),
> > > and could get `pulsar_backlog_time_quota_used_percentage` by divide
> `now
> > -
> > > 

[DISCUSS] new idea: reverse reading a topic

2023-03-06 Thread Alexandre DUVAL

Hi,

I'm wondering if it is possible to introduce a new feature on Pulsar 
which will enable users to read topic from a defined MessageId to 
previous messages until the begin of the topic.


I tried to use Pulsar SQL but it requires so much RAM even for little 
queries (due to Presto design).


Currently, every read in Pulsar are expected to be going forward. So it 
might be a bit tricky to prevent every weird behavior by introducing the 
feature.


I'm currently tried to make an MVP/POC by introducting a readReverse 
field in the CommandSubscribe that is used by ReaderAPI and currently 
looking for to create a getFirstMessageId() on ManagedLedger 
(https://github.com/CleverCloud/pulsar/pull/3). I also removed 
startPosition < endPosition sanity checks in BookKeeper locally 
(https://github.com/CleverCloud/bookkeeper/pull/2).


We definitely prefer a readPrevious(), hasPreviousMessageAvailable() in 
the ReaderAPI.


I'm not familiar with these internals such as NonDurableCursor, 
RangeEntryCache, ManagedCursor so it's a bit tricky.


So I wondering someone to help/guide me or even directly handle the 
subject (or the discuss).


Regards,

Kannar




Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

2023-03-06 Thread Asaf Mesika
Can you convert the code block which is actually a quote in the
beginning of the PIP to something which doesn't require to scroll
horizontally so much?
Use
https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text

Let's improve the clarity of what you wrote:

"the PROTOBUF uses avro struct to store."
-->
When Schema type PROTOBUF is used, Pulsar Client assumes the object given
to it as message data is an auto-generated POJO containing the annotations
encoding the schema. The client is using a converter, which converts a
Protobuf schema descriptor into an Avro schema and sends that as the Schema
of the producer/consumer.

"On the broker side, protobuf and avro both use SchemaData converted to
org.apache.avro.Schema."
-->
Since the schema is an Avro schema, the implementation of compatibility
check on the broker side is to simply re-use the compatibility check of the
AVRO schema type.

"ProtobufSchema is different from ProtobufNativeSchema in schema
compatibility check it uses avro-protobuf.
https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
But the current implementation of ProtobufNative schema compatibility
check only
checked if the root message name is changed."

-->
PROTOBUF_NATIVE schema type is different.
The client is actually using Protobuf Descriptor as the schema, as opposed
to Avro schema of PROTOBUF schema type. In the broker, the PROTOBUF_NATIVE
compatibility check actually hasn't implemented any rule, besides one:
checking if the root message name has changed.



>1. For now, there is no official or third-party solution for ProtoBuf
>compatibility. If in the future have better solutions of a third party or
>the official, we develop new ProtobufNativeSchemaValidator and use, so
>add a flag.
>
> Who do you need to make that configurable? Once you found a third party,
just switch to it? Who knows, maybe you never will. Introduce it when you
find it, not now.


We improve in ProtobufNativeSchemaCompatibilityCheck BACKWARD, FORWARD
> these strategies. As with the AVRO implementation, protobuf compatibility
> checking need implementing the canRead method. *This will check that
> the writtenschema can be read by readSchema.*


I completely disagree.
Avro implementation is confusing for our use case. Don't copy that.

You have

public void checkCompatible(SchemaData from, SchemaData to,
SchemaCompatibilityStrategy strategy)
throws IncompatibleSchemaException {
Descriptor fromDescriptor =
ProtobufNativeSchemaUtils.deserialize(from.getData());
Descriptor toDescriptor =
ProtobufNativeSchemaUtils.deserialize(to.getData());
switch (strategy) {
case BACKWARD_TRANSITIVE:
case BACKWARD:
case FORWARD_TRANSITIVE:
case FORWARD:
case FULL_TRANSITIVE:
case FULL:
checkRootMessageChange(fromDescriptor, toDescriptor, strategy);
return;
case ALWAYS_COMPATIBLE:
return;
default:
throw new IncompatibleSchemaException("Unknown
SchemaCompatibilityStrategy.");
}
}

I would rename :
from --> currentSchema
to --> newSchema

Use that switch case and have a method for each like:
validateBackwardsCompatibility(currentSchema, newSchema)

I dislike canRead and usage of writtenSchema, since you have two completely
different use cases: from the producing side and the consumer side.

schemaValidatorBuilder
>
> I dislike this proposal. IMO Avro implementation is way too complicated.
Why not have a simple function for validation for each switch case above?
Why do we need strategy and builder, and all this complexity?


*Here are the basic compatibility rules we've defined:*


IMO it's impossible to read the validation rules as you described them.
I wrote how they should be structured numerous times above.
I can't validate them.


IMO, the current design is very hard to read.
Please try to avoid jumping into code sections.
Write a high level design section, in which you describe in words what you
plan to do.
Write the validation rules in the structure that is easy to understand:
rules per each compatibility check, and use proper words (current schema,
new schema), since new schema can be once used for read and once used for
write.

In its current form it takes too much time to understand the design, and it
shouldn't be the case.

Thanks,

Asaf


>



On Sun, Mar 5, 2023 at 3:58 PM SiNan Liu  wrote:

> Hi! I updated the explanation of some things in the PIP issue. And also
> added a new “flag” in the conf is used as the different
> ProtobufNativeSchemaValidator implementation, also set
> ProtobufNativeSchemaValidator default only check whether the name of the
> root message is the same.
>
>
> Thanks,
> sinan
>
>
> Asaf Mesika  于2023年3月5日周日 20:21写道:
>
> > On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu  wrote:
> >
> > > >
> > > > Can you please explain how a 

Re: [DISCUSS] PMC/Committer Emiratus status

2023-03-06 Thread tison
Hi Yu,

You can start by adding a page on the contribution guide or a table in the
README of the main repo/site repo to state that you're an expert in the
document domain.

I have an in-house landscape about Pulsar modules, the broader ecosystem,
and their active contributors/maintainers. Since people may not want to be
referred to publicly by another person, I don't make it public.

Finding experts can be a skill to collaborate in a community. You can find
them when browsing relative PRs, analyzing the commit history, meeting them
in the community, and having conversations. I don't have the motivation to
maintain such a table publicly.

Scala has a table of domain experts that can help[1]. If you like it, you
can list yourself and try to bring other experts to list themselves. The
ASF shares a sense that "The Foundation belongs to *you*". You're already a
PMC member and able to drive such an effort.

Best,
tison.

[1] https://github.com/scala/scala#get-in-touch


Yu  于2023年3月6日周一 19:48写道:

> Hi Asaf,
>
> Thanks for bringing this up!
>
> If I may put my two pennies' worth:
>
> To be honest, this idea flashed across my mind previously. I talked about
> this to my colleague, and he was surprised that I was willing to be
> deprived of benefits (at that time, I was a PMC member already).
>
> PMC members are vital promotors and driving forces of a community. Ideally,
> they should be direction leaders and make great contributions
> *continuously*. No one should enjoy the benefits of honor but not
> contribute much *all the time*. Setting retirement bars for PMC members
> reminds us to contribute and provide value. Maybe I'm a little aggressive
> :-)
>
> ~~
>
> +1 but a long list of PMC members with many inactive members does not
> create a good feeling since "false prosperity" is no better than "real
> contributions".
>
> > 3. Merit doesn’t expire.
>
> ~~
>
> Compared to my previous thought, Goetz has proposed a better idea since:
>
> 1. It's mild and can be accepted by many PMC members. A kind of life wisdom
> :-)
>
> 2. People who need help (e.g., PIP approvals / PR comments / ...) from PMC
> members can check the flags to know who is available to help.
>
> Except for flags, I suggest adding "area of expertise" for PMC members and
> committers, so people will know who are the most suitable experts to ask
> for help or collaborate.
>
> > 1. You can maintain active/inactive status at the project level with a
> simple flag on a community page, without removing people from the PMC.
> > 2. By making it an informal, self-reported flag, you avoid the overhead
> of board resolutions, etc. and just manage it at the community level. If
> someone wants to change their status, they can just say so or submit a pull
> request to change their status on the pulsar website.
>
> ~~
>
> Yu
>
> On Sun, Mar 5, 2023 at 10:31 PM Asaf Mesika  wrote:
>
> > Thanks to everyone who took the time to carefully answer with detailed
> > explanations.
> > I personally learned a lot about Apache projects this way (made me read
> > about it some more).
> >
> > So my personal recap is:
> >
> >- The goal of knowing the health of the Apache Pulsar community can be
> >achieved by taking a look at monthly active contributors over time
> >displayed on the community page.
> >   - It could be nice getting those numbers on the mailing list itself
> >   as well.
> >- Calculating the engagement is not an easy task.
> >- Kicking people off is not something you'd like to do in general and
> >specifically for volunteers.
> >- People's credit for work, which is also expressed in PMC membership
> >never expires due to Merit never expires - your work credit and earned
> >right should not expire.
> >
> >
> > I personally see PMC members answering someone not a PMC member nor a
> > comitter on this topic as a very healthy community indicator :)
> >
> > Thanks !
> >
> > Asaf
> >
> > On Fri, Mar 3, 2023 at 10:22 AM Enrico Olivelli 
> > wrote:
> >
> > > This is an interesting discussion.
> > > Good to see this kind of a discussion on the dev@ mailing list, this
> > > way more people are aware of the fact that we are a project in the ASF
> > > and there is a Project Management Committee.
> > >
> > > I have been following a few Apache projects for a while, and I believe
> > > that this kind of discussions should be run on the private@ mailing
> > > list.
> > > It is the PMC that usually deals with this stuff.
> > >
> > > As Tison said, the common practice is that you never remove anyone
> > > from a PMC or from the Committers list.
> > >
> > > This happens only in rare cases where an individual behaves in such a
> > > way that the Project or the Foundation could be damaged,
> > > for instance if you speak on behalf of the project and you offend
> > > someone publicly.
> > >
> > > Inactive contributors/committers/PMC members do not do any harm to a
> > > project.
> > >
> > > Some projects have some rules that 

Re: [DISCUSS] Using bouncycastle fips instead bouncycastle non-fips

2023-03-06 Thread Asaf Mesika
So it means the change is only on the client side, not the broker side?


On Fri, Mar 3, 2023 at 11:42 AM Zixuan Liu  wrote:

> Hi all,
>
> We only use the BC to encrypt the message, not TLS, so I think we can
> migrate to the BC-FIPS.
>
> If you think it's feasible, I'll try to do it, and if it doesn't pass the
> Pulsar test, I'll keep using the BC.
>
> Thanks,
> Zixuan
>
>
>
> YuWei Sung  于2023年3月2日周四 00:40写道:
>
> > BC and BC-FIPS differences are the cipher suites. This is similar to
> TLS1.1
> > vs 1.2 vs 1.3. Some suites are deprecated (not secured enough due to
> > compute power improvement).
> > in TLS 1.3, client has no chance to specific weak cipher suites to
> connect
> > to server and exploit the weakness.
> > For BC-FIPS harden pulsar cluster, brokers should reject connections from
> > clients with BC (clients must use Security.provider bc-fips).
> > For BC non fips cluster, it should be flexible. client with bc-fips or bc
> > should be able to connect to pulsar (bc).
> >
> > 
> >
> >
> > Yu Wei Sung
> >
> > Sr. Solutions Engineer
> >
> >
> > streamnative.io
> >
> > 
> > 
> > 
> >
> >
> > On Wed, Mar 1, 2023 at 10:28 AM Zixuan Liu  wrote:
> >
> > > > Actually I was expecting that part of the discussion will specify the
> > > > difference between using FIPS compared with non-FIPS, in each
> > > BouncyCastle
> > > > usage: TLS and message encryption.
> > >
> > > Good catch! I'll check this.
> > >
> > > Asaf Mesika  于2023年3月1日周三 21:19写道:
> > >
> > > > On Mon, Feb 27, 2023 at 4:35 PM Zixuan Liu 
> wrote:
> > > >
> > > > > > users might get exceptions if they don't use specific algorithms
> or
> > > > > encryption schemes?
> > > > >
> > > > > Could you share more info about this?
> > > > >
> > > >
> > > > Actually I was expecting that part of the discussion will specify the
> > > > difference between using FIPS compared with non-FIPS, in each
> > > BouncyCastle
> > > > usage: TLS and message encryption.
> > > >
> > > >  I imagined that FIPS has a shorter list of ciphers it supports.
> > > >
> > > >
> > > >
> > > > > Asaf Mesika  于2023年2月27日周一 18:01写道:
> > > > >
> > > > > > So if I understand you correctly, once you switch to the FIPS
> > version
> > > > of
> > > > > > Bouncy Castle, users might get exceptions if they don't use
> > specific
> > > > > > algorithms or encryption schemes?
> > > > > > Potentially a breaking change?
> > > > > > You can't switch it off via config?
> > > > > >
> > > > > > On Wed, Feb 22, 2023 at 3:56 PM Zixuan Liu 
> > > wrote:
> > > > > >
> > > > > > > > 1. What is FIPS?
> > > > > > >
> > > > > > > FIPS (Federal Information Processing Standards) are a set of
> > > > standards
> > > > > > that
> > > > > > > describe document processing, encryption algorithms and other
> > > > > information
> > > > > > > technology standards for use within non-military government
> > > agencies
> > > > > and
> > > > > > by
> > > > > > > government contractors and vendors who work with the agencies.
> > > > > > >
> > > > > > > > 2. Why is the FIPS version safer exactly?
> > > > > > >
> > > > > > > FIPS standard is strict. When using the FIPS version, this is
> > also
> > > > very
> > > > > > > strict and standard.
> > > > > > >
> > > > > > > > 3. What is bouncycastle used exactly in Pulsar?
> > > > > > >
> > > > > > > We use the bouncycastle as the TLS provider,  and used for the
> > > > > end-to-end
> > > > > > > message encryption.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Zixuan
> > > > > > >
> > > > > > > Asaf Mesika  于2023年2月22日周三 21:23写道:
> > > > > > >
> > > > > > > > Can you elaborate a bit:
> > > > > > > > 1. What is FIPS?
> > > > > > > > 2. Why is the FIPS version safer exactly?
> > > > > > > > 3. What is bouncycastle used exactly in Pulsar?
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Feb 22, 2023 at 11:58 AM Zixuan Liu <
> node...@gmail.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > I would like to discuss using the bouncycastle fips instead
> > of
> > > > the
> > > > > > > > > bouncycastle non-fips.
> > > > > > > > >
> > > > > > > > > The bouncycastle is a Java library that complements the
> > default
> > > > > Java
> > > > > > > > > Cryptographic Extension (JCE), which has two versions: fips
> > > > version
> > > > > > and
> > > > > > > > > non-fips version.
> > > > > > > > >
> > > > > > > > > The fips version is safer than non-fips. When the security
> > > level
> > > > is
> > > > > > > very
> > > > > > > > > high, many policies require the fips version, but the
> Pulsar
> > > > > default
> > > > > > > uses
> > > > > > > > > the non-fips version. Switch this is complex, because
> > > > > > > > > the `pulsar-client-messagecrypto-bc` module and root
> project
> > > > > depends
> > > > > > on
> > > > > > > > the
> > > > > > > > > non-fips, so I 

Re: [DISCUSS] PMC/Committer Emiratus status

2023-03-06 Thread Yu
Hi Asaf,

Thanks for bringing this up!

If I may put my two pennies' worth:

To be honest, this idea flashed across my mind previously. I talked about
this to my colleague, and he was surprised that I was willing to be
deprived of benefits (at that time, I was a PMC member already).

PMC members are vital promotors and driving forces of a community. Ideally,
they should be direction leaders and make great contributions
*continuously*. No one should enjoy the benefits of honor but not
contribute much *all the time*. Setting retirement bars for PMC members
reminds us to contribute and provide value. Maybe I'm a little aggressive
:-)

~~

+1 but a long list of PMC members with many inactive members does not
create a good feeling since "false prosperity" is no better than "real
contributions".

> 3. Merit doesn’t expire.

~~

Compared to my previous thought, Goetz has proposed a better idea since:

1. It's mild and can be accepted by many PMC members. A kind of life wisdom
:-)

2. People who need help (e.g., PIP approvals / PR comments / ...) from PMC
members can check the flags to know who is available to help.

Except for flags, I suggest adding "area of expertise" for PMC members and
committers, so people will know who are the most suitable experts to ask
for help or collaborate.

> 1. You can maintain active/inactive status at the project level with a
simple flag on a community page, without removing people from the PMC.
> 2. By making it an informal, self-reported flag, you avoid the overhead
of board resolutions, etc. and just manage it at the community level. If
someone wants to change their status, they can just say so or submit a pull
request to change their status on the pulsar website.

~~

Yu

On Sun, Mar 5, 2023 at 10:31 PM Asaf Mesika  wrote:

> Thanks to everyone who took the time to carefully answer with detailed
> explanations.
> I personally learned a lot about Apache projects this way (made me read
> about it some more).
>
> So my personal recap is:
>
>- The goal of knowing the health of the Apache Pulsar community can be
>achieved by taking a look at monthly active contributors over time
>displayed on the community page.
>   - It could be nice getting those numbers on the mailing list itself
>   as well.
>- Calculating the engagement is not an easy task.
>- Kicking people off is not something you'd like to do in general and
>specifically for volunteers.
>- People's credit for work, which is also expressed in PMC membership
>never expires due to Merit never expires - your work credit and earned
>right should not expire.
>
>
> I personally see PMC members answering someone not a PMC member nor a
> comitter on this topic as a very healthy community indicator :)
>
> Thanks !
>
> Asaf
>
> On Fri, Mar 3, 2023 at 10:22 AM Enrico Olivelli 
> wrote:
>
> > This is an interesting discussion.
> > Good to see this kind of a discussion on the dev@ mailing list, this
> > way more people are aware of the fact that we are a project in the ASF
> > and there is a Project Management Committee.
> >
> > I have been following a few Apache projects for a while, and I believe
> > that this kind of discussions should be run on the private@ mailing
> > list.
> > It is the PMC that usually deals with this stuff.
> >
> > As Tison said, the common practice is that you never remove anyone
> > from a PMC or from the Committers list.
> >
> > This happens only in rare cases where an individual behaves in such a
> > way that the Project or the Foundation could be damaged,
> > for instance if you speak on behalf of the project and you offend
> > someone publicly.
> >
> > Inactive contributors/committers/PMC members do not do any harm to a
> > project.
> >
> > Some projects have some rules that you cannot participate in official
> > VOTEs if you are not "active".
> >
> > If anyone has some problems with someone in the community, then they
> > can reach out to priv...@pulsar.apache.org and the PMC will listen to
> > the problem and take actions.
> >
> > my 2 cents
> >
> > Enrico
> >
> > Il giorno ven 3 mar 2023 alle ore 04:39 Yunze Xu
> >  ha scritto:
> > >
> > > As a PMC member, I don't like playing a game of determining who should
> > > be removed from PMC as well.
> > >
> > > I hear a viewpoint that someone is only participating in the community
> > > only to join a PMC so that he can benefit from it. After becoming a
> > > PMC member, he is never active in the community. It might be true but
> > > I think it's acceptable. Making such a rule won't prevent such cases.
> > > If he wants, he can make use of the rule and keep himself "active" to
> > > avoid being kicked out of the PMC. Though the active state is fake.
> > >
> > > I'm not against the way to remove (or something else that sounds good)
> > > a PMC member because none of these ways is perfect. However, I'm
> > > STRONGLY AGAINST changing a rule that has been applied for some time
> > > unless it can be proved 

Re: [Discuss] PIP-248: Add backlog eviction metric

2023-03-06 Thread 太上玄元道君
Hi Aasf,
I've updated the PIP, PTAL

Thanks,
Tao Jiuming

Asaf Mesika  于2023年3月5日周日 21:00写道:

> On Thu, Mar 2, 2023 at 12:57 PM 太上玄元道君  wrote:
>
> > > I  think you should fix this explanation:
> >
> > Thanks! I would like to copy the context you provide to the PIP
> motivation,
> > your description is more detailed, so developers don't have to go through
> > the code.
> >
>
> Sure
>
>
> >
> > > Today the quota is checked periodically, right? So that's how the
> > operator
> > > knows the cost in terms of I/O is limited.
> > > Now you are adding one additional I/O per collection, every 1 min by
> > > default. That's a lot perhaps. How long is the check interval today?
> >
> > Actually, I don't want to introduce additional costs, I thought we
> > could cache its result, so that it won't introduce additional costs.
> > It may be that I did not make it clear in the PIP and caused this
> > misunderstanding, sorry.
> >
>
> Ok, just to verify: You plan to modify the code that runs periodically the
> backlog quota check, so the result will be cached there? This way when you
> pull that information from that code every 1min to expose it as a metric it
> will have 0 I/O cost?
>
>
>
> >
> > > The user today can calculate quota used for size based limit, since
> there
> > > are two metrics that are exposed today on a topic level: "
> > > pulsar_storage_backlog_quota_limit" and "pulsar_storage_backlog_size".
> > You
> > > can just divide the two to get a percentage.
> > > For the time-based limit, the only metric exposed today is quota
> itself ,
> > "
> > > pulsar_storage_backlog_quota_limit_time".
> >
> > I only noticed `pulsar_storage_backlog_size` but missed
> > `pulsar_storage_backlog_quota_limit` and
> > `pulsar_storage_backlog_quota_limit_time`. Many thanks for your reminder.
> >
> >
> > So, in this condition, we already have the following topic-level metrics:
> > `pulsar_storage_backlog_size`: The total backlog size of the topics of
> this
> > topic owned by this broker (in bytes).
> > `pulsar_storage_backlog_quota_limit`: The total amount of the data in
> this
> > topic that limits the backlog quota (bytes).
> > `pulsar_storage_backlog_quota_limit_time`: The backlog quota limit in
> > time(seconds). (This metric does not exists in the doc, need to improve)
> >
> >
> > We just need to add a new metric named
> > `pulsar_storage_earliest_msg_publish_time_in_backlog` in the topic-level
> > that indicates the publish time of the earliest message in the backlog.
> > So users could get `pulsar_backlog_size_quota_used_percentage` by divide
> > `pulsar_storage_backlog_size ` and
> > `pulsar_storage_backlog_quota_limit`(`pulsar_storage_backlog_size` /
> > `pulsar_storage_backlog_quota_limit`),
> > and could get `pulsar_backlog_time_quota_used_percentage` by divide `now
> -
> > pulsar_storage_earliest_msg_publish_time_in_backlog` and
> > `pulsar_storage_backlog_quota_limit_time` (`now -
> > pulsar_storage_earliest_msg_publish_time_in_backlog` /
> > `pulsar_storage_backlog_quota_limit_time`).
> >
>
> I think there is a problem with the name
> `pulsar_storage_earliest_msg_publish_time_in_backlog` in the topic-level:
> * First, I prefer exposing the age rather than the publish time.
> * Second, it's a bit hard to figure out the meaning of the earliest msg in
> the backlog.
>
> Maybe `pulsar_storage_backlog_age_seconds`? In the explanation you can
> write: "The age (time passed since it was published) of the earliest
> unacknowledged message based on the topic's
> existing subscriptions" ?
>
>
>
> >
> > The backlog quota time checker runs periodically, so we can cache its
> > result, so it won't lead to much costs.
> >
> > Pulsar also exposed subscription-level  `backlogSize` and
> > `earliestMsgPublishTimeInBacklog` in Pulsar-Admin
> > <
> >
> https://github.com/apache/pulsar/blob/master/pulsar-client-admin-api/src/main/java/org/apache/pulsar/client/admin/Topics.java#L1139
> > >
> > if
> > `subscriptionBacklogSize` and `getEarliestTimeInBacklog` are true.
> > We can also expose `backlogQuotaLimiteSize` and `backlogQuotaLimitTime`
> of
> > the topic to PulsarAdmin.
> >
>
> What is the relationship you see between Pulsar exposing
> subscriptionBacklogSize and earliestMsgPublishTimeInBacklog in
> subscription, to exposing the backlog quota limits in pulsar admin?
>
> Limits can be exposed to Pulsar Admin, since it has 0 cost associated with
> it.
> I think it's a good idea to do that.
> The quota usage can also be exposed to pulsar admin, since we pull that
> data from the backlog quota checker cache, so it has 0 cost as well.
>
> As we said in previous email we can also expose
> `backlogQuotaTimeOldestBacklogAgeSubscriptionName`
>
>
> >
> > After users receive the backlog alert from metrics alerting systems, they
> > can get the topic name, then, they can request Topics#getStats
> > <
> >
> https://github.com/apache/pulsar/blob/master/pulsar-client-admin-api/src/main/java/org/apache/pulsar/client/admin/Topics.java#L1139
> > 

Re: [Vote] PIP-245: Make subscriptions of non-persistent topic non-durable

2023-03-06 Thread guo jiwei
+1 (binding)

Regards
Jiwei Guo (Tboy)

On Mon, Mar 6, 2023 at 9:59 AM Yunze Xu  wrote:
>
> +1 (binding)
>
> Thanks,
> Yunze
>
> On Fri, Mar 3, 2023 at 11:46 AM PengHui Li  wrote:
> >
> > +1 (binding)
> >
> > Penghui
> >
> > > On Feb 13, 2023, at 14:56, Jiuming Tao  
> > > wrote:
> > >
> > > Hi all,
> > >
> > > I would like to start a VOTE on `PIP-245: Make subscriptions of 
> > > non-persistent topic non-durable`.
> > >
> > > Motivation:
> > >
> > > There are two types of subscriptions for a topic: Durable and Non-durable.
> > >
> > > We create a Consumer with a Durable subscription and a Reader with a 
> > > Non-durable subscription.
> > >
> > > But for NonPersistentTopic, creating a Durable subscription is 
> > > meaningless, NonPersistentSubscription doesn't have a ManagedCursor to 
> > > persistent its data. After its consumer disconnected, the subscription 
> > > couldn't be removed automatically if we didn't set the value of 
> > > subscriptionExpirationTimeMinutes greater than 0.
> > >
> > > For subscriptionExpirationTimeMinutes, it controls the subscription 
> > > expiration of NonPersistentTopic and PersistentTopic, if we set the value 
> > > of subscriptionExpirationTimeMinutes greater than 0, it may lead to data 
> > > loss(The durable subscriptions of PersistentTopic also can be removed).
> > >
> > > And the Non-durable subscriptions will be removed automatically after all 
> > > the consumers disconnected, it's the existing logic.
> > >
> > > For the purpose of removing the subscriptions which have no active 
> > > consumers of NonPersistentTopic and the above reasons, we can make all 
> > > the subscriptions of a NonPersistentTopic Non-durable.
> > >
> > >
> > >
> > > For more details, you can read: 
> > > https://github.com/apache/pulsar/issues/19448 
> > > 
> > >
> > > And the discuss thread is available at: 
> > > https://lists.apache.org/thread/2ltmyglnb25jy8nk58twkwbglws43bst 
> > > 
> > >
> > > Thanks,
> > > Tao Jiuming
> >


[ANNOUNCE] Apache Pulsar Node.js client 1.8.1 released

2023-03-06 Thread Baodi Shi
The Apache Pulsar team is proud to announce Apache Pulsar Node.js client
version 1.8.1.

Pulsar is a highly scalable, low latency messaging platform running on
commodity hardware. It provides simple pub-sub semantics over topics,
guaranteed at-least-once delivery of messages, automatic cursor management
for
subscribers, and cross-datacenter replication.

For Pulsar Node.js client release details and downloads, visit:
https://www.npmjs.com/package/pulsar-client

Release Notes are at:
https://github.com/apache/pulsar-client-node/releases

We would like to thank the contributors that made the release possible.

Regards,

The Pulsar Team



Thanks,
Baodi Shi