Re: [ANNOUNCE] Yubiao Feng as new PMC member in Apache Pulsar

2024-01-09 Thread
Congrats! Yubiao.

Thanks,
Bo

Max Xu  于2023年11月14日周二 11:10写道:
>
> Congratulations! Yubiao, well deserved
>
> Best,
> Max Xu
>
>
> On Mon, Nov 13, 2023 at 3:37 PM mattison chao 
> wrote:
>
> > Dear Community,
> >
> > We are thrilled to announce that Yubiao Feng
> > https://github.com/poorbarcode has been invited and has accepted the role
> > of a member of the Apache Pulsar Project Management Committee (PMC).
> >
> > Yubiao Feng has proven to be an invaluable asset to our community,
> > consistently showcasing dedication and active engagement through
> > substantial contributions. Beyond his noteworthy technical input, Yubiao
> > plays a crucial role in meticulously reviewing pull requests, thereby
> > ensuring the overall excellence of our project. We eagerly anticipate and
> > appreciate his ongoing contributions. On behalf of the Pulsar PMC, we
> > extend a heartfelt welcome and congratulations to Yubiao Feng.
> >
> > Sincerely,
> > Mattison


Re: [VOTE] PIP-301: Introduce LoadBalanceResources to unify the load-date CRUD

2023-09-19 Thread
+1 (binding)

Thanks,
Bo


Re: [VOTE] PIP-290 Support WSS E2E encryption and not need to expose the private key to the WebSocket Proxy

2023-08-23 Thread
+1 (binding)

The changes are minimal, and the concept is easy for users to
understand. I support this PIP.

Thanks,
Bo

Yubiao Feng  于2023年8月23日周三 10:49写道:
>
> Hi Mattison
>
> > but anyway. IMO, it's better to split the
> > implementation PR into multiple that
> > will help the reviewer review this
> > PR more easily.
>
> Good suggestion.
> After removing tests, the amount of code lines should be very little.
> Mainly changes in the class  `ProducerHandler`.
>
> > IMO, it's better to introduce a public AP
> >I to help client support it. but it's fine
> > to use it to solve the web socket
> > problem now.
>
> Good suggestion.
> We can talk about improve the client design and adding API later
>
> Thanks
> Yubiao Feng
>
>
> On Tue, Aug 22, 2023 at 12:46 PM  wrote:
>
> > +1 (binding)
> >
> > This proposal looks great to me. But I've got several concerns which will
> > not affect this PIP voting.
> >
> > 1. You mixed compression and E2E encryption support in one proposal. I am
> > unsure if we should split them into two parts(compression & E2E) to help
> > make the proposal not too complex. but anyway. IMO, it's better to split
> > the implementation PR into multiple that will help the reviewer review this
> > PR more easily.
> > 2. Actually, We uses a tricky way to avoid client don't compress the
> > compressed data again(We set producer compression type to NONE. but we use
> > the ProducerImpl to send a message entity with compressed data). IMO, it's
> > better to introduce a public API to help client support it. but it's fine
> > to use it to solve the web socket problem now.
> >
> >
> >
> > Best,
> > Mattison
> > On 22 Aug 2023 at 11:16 +0800, PengHui Li , wrote:
> > > +1(binding)
> > >
> > > - The motivation looks good to me. The proposal will provide a real e2e
> > > encryption solution for the WebSocket proxy
> > > - The solution looks good to me. It will not introduce break changes and
> > > will use public APIs as much as possible. And it will not introduce any
> > > extra configuration. The API definition is clear and aligns with the
> > > existing naming pattern.
> > > - For the public API changes. We already have an encryptionKey field, but
> > > it is key names, which not aligned with the existing definition of the
> > > encryptionKey in the binary protocol. Instead of introducing a new one
> > like
> > > encryptionKeyValue, the proposal will use the existing one(encryptionKey)
> > > and check the format on the server side. It's not so good, but better
> > than
> > > adding a new one to confuse users.
> > > - The proposal quality looks good to me. It provides enough context about
> > > what is the existing solution and what is the new solution. And provides
> > a
> > > comprehensive example to show what the new way looks like.
> > >
> > > Regards,
> > > Penghui
> > >
> > >
> > >
> > > On Mon, Aug 21, 2023 at 5:30 PM Yubiao Feng
> > >  wrote:
> > >
> > > > Sorry, the PR link in the last email is ambiguous,
> > > > https://github.com/apache/pulsar/pull/20923 is the correct one.
> > > >
> > > > Thanks
> > > > Yubiao Feng
> > > >
> > > > On Mon, Aug 21, 2023 at 4:07 PM Yubiao Feng <
> > yubiao.f...@streamnative.io>
> > > > wrote:
> > > >
> > > > > > Hello, Guys
> > > > > >
> > > > > > Since there are no concerns in the discussion mail, I'd like to
> > start
> > > > > > voting for this PIP.
> > > > > >
> > > > > > The PIP link: https://github.com/apache/pulsar/pull/
> > > > > > 20923
> > > > > > 
> > > > > >
> > > > > > Thanks
> > > > > > Yubiao Feng
> > > > > >
> > > >
> >


[DISCUSS] PIP-285: Add pulsar_subscription_back_log_duration metric

2023-07-24 Thread
Hi, Pulsar Community

I opened a new PIP design PR.

https://github.com/apache/pulsar/pull/20859

Thanks,
Bo


Re: [VOTE] PIP-275: Introduce topicOrderedExecutorThreadNum to deprecate numWorkerThreadsForNonPersistentTopic in configuration

2023-06-19 Thread
+1(binding)

Thanks,
Bo

houxiaoyu  于2023年6月19日周一 14:04写道:
>
> Hi, community:
>
> This thread is to start a vote for PIP-275: Introduce
> topicOrderedExecutorThreadNum to deprecate
> numWorkerThreadsForNonPersistentTopic in configuration.
>
> Discussion thread:
> https://lists.apache.org/thread/hx8v824v5wdoz3kn44s4t9pzgfnqkt1o
> PIP-PR: https://github.com/apache/pulsar/pull/20507
>
> Sincerely
> Xiaoyu Hou


Re: [VOTE] PIP-251 Enhancing Transaction Buffer Stats and Introducing TransactionBufferInternalStats API

2023-05-11 Thread
+1(binding)

Thanks,
Bo

Nicolò Boschi  于2023年5月11日周四 16:38写道:
>
> +1 binding
>
> I'm happy that we're going to improve the monitoring tools for
> transactions,
> which is probably the aspect that is lacking more from a user perspective
>
>
> Nicolò Boschi
>
>
> Il giorno mer 10 mag 2023 alle ore 10:58 Xiangying Meng <
> xiangy...@apache.org> ha scritto:
>
> > Hello Pulsar community,
> >
> > This thread is to start a vote for PIP-251: Enhancing Transaction
> > Buffer Stats and Introducing TransactionBufferInternalStats API.
> >
> > Discussion thread:
> > https://lists.apache.org/thread/jsh2rod208xg28mojxwrod84p5zt1nrw
> > Issue:
> > https://github.com/apache/pulsar/issues/20291
> >
> > Voting will be open for at least 48 hours.
> >
> > Thanks!
> > Xiangying
> >


Re: [Python] Should we make the schema default compatible with Java client?

2023-03-30 Thread
Hi, Yunze:

+1

> Just checked this thread and found I didn't paste this issue:
> https://github.com/apache/pulsar-client-python/issues/108. You can see
> the schema compatibility strategy is FORWARD, then the sorted schema
> from the Java client overwrote the unsorted schema from the Python
> client. However, the Python consumer that uses the old schema failed
> to decode the message of the new schema.

if this changes the default behavior, the user upgrading the python
client will register one more schema. This is a breaking change, so the
old users need to change these using python schema code when they
upgrade the python client. We need to note it in the release note.

Thanks
Bo
>
> My goal is to make the Python client act the same as the Java client
> since the next formal release. Regarding how the broker processes it,
> I think it's another thing to be fixed.
>
> Thanks,
> Yunze
>
> On Thu, Mar 30, 2023 at 8:42 PM 丛搏  wrote:
> >
> > Hi, Yunze:
> >
> > > Regarding the 1st question, yes, that's why I open this thread to
> > > discuss. If we change these default values, the behavior of new Python
> > > clients will be like the Java client. In addition, it actually reverts
> > > the breaking change brought in #12232.
> >
> > I also kind of forget why we have #12232 to change the default behavior
> > Maybe the python2 and python3 order rule is different.
> >
> > If we change the order is the default value, for every topic that uses
> > python client will register a new schema. Will it register a new
> > schema? Maybe we should add a special logic in the broker to
> > check the python client version and make it will not register
> > a new schema. Otherwise, the impact may still be quite large.
> >
> > Thanks,
> > Bo
> > >
> > > Regarding the 2nd question, yes, they are both sorted in alphabetical
> > > order. I don't know the behavior of the .NET clients, for C++, Golang,
> > > Node.js clients, they all do not support generating schema definition
> > > from a DTO.
> > >
> > > Thanks,
> > > Yunze
> > >
> > > On Thu, Mar 30, 2023 at 10:14 AM 丛搏  wrote:
> > > >
> > > > Hi, Yunze :
> > > >
> > > > 1. If the changes may cause some compatibility issues.
> > > > How do we solve the compatibility issues? It may be a
> > > > breaking change.
> > > >
> > > > 2. Another question is if sorting is enabled by default,
> > > > is the sorting rule the same as java or other clients?
> > > >
> > > > Putting aside the above two problems, I think it is
> > > > good to be consistent with other clients.
> > > >
> > > > Thanks,
> > > > Bo
> > > >
> > > > Eric Hare  于2023年3月29日周三 22:42写道:
> > > > >
> > > > > +1 - i think keeping the `_sorted_fields` and `_required` defaults 
> > > > > consistent between the clients is the way to go.
> > > > >
> > > > > > On Mar 29, 2023, at 7:09 AM, Yunze Xu 
> > > > > >  wrote:
> > > > > >
> > > > > > I found the Python client has two options to control the behavior:
> > > > > > 1. Set `_sorted_fields`. It's false by default in the Python client,
> > > > > > but it's true in the Java client. i.e. the Java client sorts all
> > > > > > fields by default.
> > > > > > 2. Set `_required`. It's false by default for all types in the 
> > > > > > Python
> > > > > > client, but it's only false for the string type in the Java client.
> > > > > >
> > > > > > i.e. given the following Java class:
> > > > > >
> > > > > > ```java
> > > > > > class User {
> > > > > >String name;
> > > > > >int age;
> > > > > >double score;
> > > > > > }
> > > > > > ```
> > > > > >
> > > > > > We have to give the following definition in Python:
> > > > > >
> > > > > > ```python
> > > > > > class User(Record):
> > > > > >_sorted_fields = True
> > > > > >name = String()
> > > > > >age = Integer(required=True)
> > > > > >score = Double(required=True)
> > > > > > ```
> > > > > >
> > > > > > 

Re: [Python] Should we make the schema default compatible with Java client?

2023-03-30 Thread
Hi, Yunze:

> Regarding the 1st question, yes, that's why I open this thread to
> discuss. If we change these default values, the behavior of new Python
> clients will be like the Java client. In addition, it actually reverts
> the breaking change brought in #12232.

I also kind of forget why we have #12232 to change the default behavior
Maybe the python2 and python3 order rule is different.

If we change the order is the default value, for every topic that uses
python client will register a new schema. Will it register a new
schema? Maybe we should add a special logic in the broker to
check the python client version and make it will not register
a new schema. Otherwise, the impact may still be quite large.

Thanks,
Bo
>
> Regarding the 2nd question, yes, they are both sorted in alphabetical
> order. I don't know the behavior of the .NET clients, for C++, Golang,
> Node.js clients, they all do not support generating schema definition
> from a DTO.
>
> Thanks,
> Yunze
>
> On Thu, Mar 30, 2023 at 10:14 AM 丛搏  wrote:
> >
> > Hi, Yunze :
> >
> > 1. If the changes may cause some compatibility issues.
> > How do we solve the compatibility issues? It may be a
> > breaking change.
> >
> > 2. Another question is if sorting is enabled by default,
> > is the sorting rule the same as java or other clients?
> >
> > Putting aside the above two problems, I think it is
> > good to be consistent with other clients.
> >
> > Thanks,
> > Bo
> >
> > Eric Hare  于2023年3月29日周三 22:42写道:
> > >
> > > +1 - i think keeping the `_sorted_fields` and `_required` defaults 
> > > consistent between the clients is the way to go.
> > >
> > > > On Mar 29, 2023, at 7:09 AM, Yunze Xu  
> > > > wrote:
> > > >
> > > > I found the Python client has two options to control the behavior:
> > > > 1. Set `_sorted_fields`. It's false by default in the Python client,
> > > > but it's true in the Java client. i.e. the Java client sorts all
> > > > fields by default.
> > > > 2. Set `_required`. It's false by default for all types in the Python
> > > > client, but it's only false for the string type in the Java client.
> > > >
> > > > i.e. given the following Java class:
> > > >
> > > > ```java
> > > > class User {
> > > >String name;
> > > >int age;
> > > >double score;
> > > > }
> > > > ```
> > > >
> > > > We have to give the following definition in Python:
> > > >
> > > > ```python
> > > > class User(Record):
> > > >_sorted_fields = True
> > > >name = String()
> > > >age = Integer(required=True)
> > > >score = Double(required=True)
> > > > ```
> > > >
> > > > I see https://github.com/apache/pulsar/pull/12232 adds the
> > > > `_sorted_fields` field and disables the field sort by default. It
> > > > breaks compatibility with the Java client.
> > > >
> > > > IMO, we should make `_sorted_fields` true by default and `_required`
> > > > true for all types other than `String` by default.
> > > >
> > > > Thanks,
> > > > Yunze
> > > >
> > > > On Wed, Mar 29, 2023 at 4:00 PM Yunze Xu  wrote:
> > > >>
> > > >> Hi all,
> > > >>
> > > >> Recently I found the default generated schema definition in the Python
> > > >> client is different from the Java client, which leads to some
> > > >> unexpected behavior.
> > > >>
> > > >> For example, given the following class definition in Python:
> > > >>
> > > >> ```python
> > > >> class Data(Record):
> > > >>i = Integer()
> > > >> ```
> > > >>
> > > >> The type of `i` field is a union: "type": ["null", "int"]
> > > >>
> > > >> While given the following class definition in Java:
> > > >>
> > > >> ```java
> > > >> class Data {
> > > >>private final int i;
> > > >>/* ... */
> > > >> }
> > > >> ```
> > > >>
> > > >> The type of `i` field is an integer: "type": "int"
> > > >>
> > > >> It brings an issue that if a Python consumer subscribes to a topic
> > > >> with schema defined above, then a Java producer will fail to create
> > > >> because of the schema incompatibility.
> > > >>
> > > >> Currently, the workaround is to change the schema compatibility
> > > >> strategy to FORWARD.
> > > >>
> > > >> Should we change the way to generate schema definition in the Python
> > > >> client to be compatible with the Java client? It could bring breaking
> > > >> changes to old Python clients, but it could guarantee compatibility
> > > >> with the Java client.
> > > >>
> > > >> If not, we still have to introduce an extra configuration to make
> > > >> Python schema compatible with Java schema. But it requires code
> > > >> changes. e.g. here is a possible solution:
> > > >>
> > > >> ```python
> > > >> class Data(Record):
> > > >># NOTE: Users might have to add this extra field to control how to
> > > >> generate the schema
> > > >>__java_compatible = True
> > > >>i = Integer()
> > > >> ```
> > > >>
> > > >> Thanks,
> > > >> Yunze
> > >


Re: [Python] Should we make the schema default compatible with Java client?

2023-03-29 Thread
Hi, Yunze :

1. If the changes may cause some compatibility issues.
How do we solve the compatibility issues? It may be a
breaking change.

2. Another question is if sorting is enabled by default,
is the sorting rule the same as java or other clients?

Putting aside the above two problems, I think it is
good to be consistent with other clients.

Thanks,
Bo

Eric Hare  于2023年3月29日周三 22:42写道:
>
> +1 - i think keeping the `_sorted_fields` and `_required` defaults consistent 
> between the clients is the way to go.
>
> > On Mar 29, 2023, at 7:09 AM, Yunze Xu  wrote:
> >
> > I found the Python client has two options to control the behavior:
> > 1. Set `_sorted_fields`. It's false by default in the Python client,
> > but it's true in the Java client. i.e. the Java client sorts all
> > fields by default.
> > 2. Set `_required`. It's false by default for all types in the Python
> > client, but it's only false for the string type in the Java client.
> >
> > i.e. given the following Java class:
> >
> > ```java
> > class User {
> >String name;
> >int age;
> >double score;
> > }
> > ```
> >
> > We have to give the following definition in Python:
> >
> > ```python
> > class User(Record):
> >_sorted_fields = True
> >name = String()
> >age = Integer(required=True)
> >score = Double(required=True)
> > ```
> >
> > I see https://github.com/apache/pulsar/pull/12232 adds the
> > `_sorted_fields` field and disables the field sort by default. It
> > breaks compatibility with the Java client.
> >
> > IMO, we should make `_sorted_fields` true by default and `_required`
> > true for all types other than `String` by default.
> >
> > Thanks,
> > Yunze
> >
> > On Wed, Mar 29, 2023 at 4:00 PM Yunze Xu  wrote:
> >>
> >> Hi all,
> >>
> >> Recently I found the default generated schema definition in the Python
> >> client is different from the Java client, which leads to some
> >> unexpected behavior.
> >>
> >> For example, given the following class definition in Python:
> >>
> >> ```python
> >> class Data(Record):
> >>i = Integer()
> >> ```
> >>
> >> The type of `i` field is a union: "type": ["null", "int"]
> >>
> >> While given the following class definition in Java:
> >>
> >> ```java
> >> class Data {
> >>private final int i;
> >>/* ... */
> >> }
> >> ```
> >>
> >> The type of `i` field is an integer: "type": "int"
> >>
> >> It brings an issue that if a Python consumer subscribes to a topic
> >> with schema defined above, then a Java producer will fail to create
> >> because of the schema incompatibility.
> >>
> >> Currently, the workaround is to change the schema compatibility
> >> strategy to FORWARD.
> >>
> >> Should we change the way to generate schema definition in the Python
> >> client to be compatible with the Java client? It could bring breaking
> >> changes to old Python clients, but it could guarantee compatibility
> >> with the Java client.
> >>
> >> If not, we still have to introduce an extra configuration to make
> >> Python schema compatible with Java schema. But it requires code
> >> changes. e.g. here is a possible solution:
> >>
> >> ```python
> >> class Data(Record):
> >># NOTE: Users might have to add this extra field to control how to
> >> generate the schema
> >>__java_compatible = True
> >>i = Integer()
> >> ```
> >>
> >> Thanks,
> >> Yunze
>


Re: [DISCUSS] Change PIP template

2023-03-29 Thread
+1

Good discussion!

Thanks,
Bo

Asaf Mesika  于2023年3月29日周三 20:11写道:
>
> So far only 1 PMC member reviewed it.
> Any other PMC member would like to review the new template for PIP?
>
> On Wed, Mar 22, 2023 at 1:10 PM Asaf Mesika  wrote:
>
> > Any other PMC member can take a look at the new template PR
> > ?
> > Ideally I would like to have 2-3 PMC member approval for this.
> >
> >
> > On 17 Mar 2023, at 18:23, Michael Marshall  wrote:
> >
> > Thanks for this initiative, Asaf.
> >
> > As part of this process, I would like for us to add a security and a
> > multi-tenancy section to the PIP template.
> >
> > As you suggest, the template conveys what the community values, and
> > these two sections must always be considered when changing Pulsar in
> > fundamental ways.
> >
> > (Thanks for already adding the security section to your template!)
> >
> > Thanks,
> > Michael
> >
> > On Thu, Mar 16, 2023 at 2:58 AM Asaf Mesika  wrote:
> >
> >
> > Here's the PR to remove the form and add a new issue template in Markdown
> > containing the suggested structure and description for each section.
> >
> > https://github.com/apache/pulsar/pull/19832
> >
> >
> > On Wed, Mar 1, 2023 at 3:43 PM Elliot West
> >  wrote:
> >
> > +1 Asaf
> >
> > I'd also suggest that we encourage the submission of relevant diagrams.
> > This is trivial to do with the GitHub markdown editor, but I suspect is
> > often neglected because users do not know the feature exists.
> >
> > On Wed, 1 Mar 2023 at 13:22, Asaf Mesika  wrote:
> >
> > Ok.
> >
> > I'll draft a PR and link it here when I'm done. Thanks!
> >
> > On Tue, Feb 28, 2023 at 7:08 AM PengHui Li  wrote:
> >
> > +1
> >
> > Penghui
> >
> > On Mon, Feb 27, 2023 at 9:24 PM Asaf Mesika 
> >
> > wrote:
> >
> >
> > Mails don't support things like markdown diagrams or images and are
> > generally less easy to read.
> > My proposal includes a required section called Links in which you
> >
> > need
> >
> > to
> >
> > fill in the discussion thread in DEV mailing list and vote thread.
> >
> >
> > On Mon, Feb 27, 2023 at 3:08 PM Girish Sharma <
> >
> > scrapmachi...@gmail.com
> >
> >
> > wrote:
> >
> > Hi Asaf,
> > I was referring to the PIP process, as a whole, as explained in
> > https://github.com/apache/pulsar/blob/master/wiki/proposals/PIP.md
> > Someone looking at GitHub ticket would find and almost empty PIP GH
> >
> > issue
> >
> > while the same PIP has had many discussions over here in the ML.
> > There is scope of improvement in the process where we either remove
> >
> > the
> >
> > first step to create the PIP over at GitHub and directly present
> >
> > the
> >
> > PIP
> >
> > in
> >
> > the first mail of the thread here, or we do all discussions in GH.
> > Both the ML and GH are searchable and linkable for tracking
> >
> > purposes.
> >
> >
> > Regards
> >
> > On Mon, Feb 27, 2023 at 6:23 PM Asaf Mesika  >
> >
> > wrote:
> >
> >
> > On Sun, Feb 26, 2023 at 2:49 PM Girish Sharma <
> >
> > scrapmachi...@gmail.com
> >
> >
> > wrote:
> >
> > Good proposal Asaf.
> > I've also wondered why the PIP creation and discussion process
> >
> > is
> >
> > so
> >
> > separated. The PIP discussion and voting starts off as a GitHub
> >
> > issue,
> >
> > but
> >
> > all of its discussion happens here on the mailing list. Is
> >
> > there
> >
> > scope
> >
> > of
> >
> > improvement in that process as well?
> >
> >
> > Not sure I follow. Can you outline the problem exactly?
> >
> >
> >
> > Regards
> >
> > On Sun, Feb 26, 2023 at 6:16 PM tison 
> >
> > wrote:
> >
> >
> > Hi Asaf,
> >
> > I agree that, generally, a PIP is written as a whole and
> >
> > paste
> >
> > as
> >
> > the
> >
> > body.
> >
> > So +1 for your proposal.
> >
> > Additionally, I'm thinking of moving the doc of procedure
> >
> > (wiki/PIP.md)
> >
> > to
> >
> > the contributions guide and use the new markdown template to
> >
> > supersede
> >
> > the
> >
> > wiki/PIP-template.md. Then we don't need to hold the wiki
> >
> > folder.
> >
> >
> > It can be an extended version to your proposal, so let's keep
> >
> > on
> >
> > your
> >
> > proposal in this thread. Just for your reference.
> >
> > Best,
> > tison.
> >
> >
> > Asaf Mesika  于2023年2月26日周日 19:18写道:
> >
> > Hi,
> >
> > I would like to suggest two changes I'd like to make to the
> >
> > PIP
> >
> > design
> >
> > template:
> > 1. Remove the form - just have a markdown template fill the
> >
> > issue
> >
> > body
> >
> > as
> >
> > it is created.
> > 2. Change the PIP template structure
> >
> > == Removing the form
> >
> > Today, when you want to submit a PIP, you are required to
> >
> > fill
> >
> > out
> >
> > a
> >
> > form
> >
> > with boxes composed of 3-4 lines length.
> > It's not good because:
> > * It broadcasts to the author: we want a very small PIP,
> >
> > something
> >
> > that
> >
> > fits those small boxes.
> > * It makes the PIP look like a bug, where you fill out
> >
> > fields.
> >
> > * It doesn't allow having H2 

Re: [DISCUSS] PIP-260: Client consumer filter received messages

2023-03-29 Thread
Hi, Yunze:

> It's better to describe how it could bring the benefit to transaction
> use cases, since now it's designed to be a configuration related to
> the transaction.
sorry, that I haven't explained in detail why the transaction needs it.
let's look at a simple example:

```
Transaction txn = getTxn();
int num = 0;
MessageId messageId = null;
while (num < 10) {
messageId = consumer.receive(5, TimeUnit.SECONDS).getMessageId();
producer.newMessage(txn).value(messageId.toString()).sendAsync();
num++;
}
consumer.acknowledgeCumulativeAsync(messageId);
txn.commit();
```
This example mainly describes the atomicity of ack and produce of
10 messages by a transaction.
If the messages we receive are duplicates, the messages we
produce will also be duplicates. Therefore, we need to ensure that
the messages we receive will not be repeated and are ordered in
failover and exclusive subscription modes. But the client consumer
does not currently have this guarantee. And it must be exactly,
otherwise, it will break the exactly-once semantics


> With this proposal and the option enabled, all these cases will filter
> the messages. That's why I think we have to consider the case for
> resetting cursors because it makes things worse.

Yes, This configuration may make the reset cursor more
difficult to use, But without this configuration, it is difficult to guarantee
the correctness of the transaction. Although we made the reset
cursor worse, we ensured correctness.

For transaction, we must first consider its correctness, and secondly,
what features to support (reset cursor eg.)

Thanks,
Bo
>
> The three cases above do not involve transaction operations. So it
> would be better to understand the benefit if you can show some typical
> cases involved with transaction operations.
>
> Thanks,
> Yunze
>
> On Wed, Mar 29, 2023 at 12:02 PM 丛搏  wrote:
> >
> > Hi, all :
> >
> > Thanks to everyone who discussed it.
> >
> > Our current care points include the following aspects:
> >
> > 1. The filtering efficiency of the client consumer is not as
> > good as doing something directly in startMessageId
> > 2. Does not support reset cursor
> >
> > Because my previous PIP description is to add configuration
> > in consumerBuilder. The definition of this configuration is not
> > clear, and it will cause great trouble to users.
> >
> > We can add a separate configuration that is only used for
> > acks with transactions. Simple example:
> >
> > ```
> > ConsumerBuilder 
> > transactionConfiguration(ConsumerTransactionConfiguration);
> >
> > @Builder
> > @Data
> > @NoArgsConstructor
> > @AllArgsConstructor
> > @InterfaceAudience.Public
> > @InterfaceStability.Stable
> >
> > public class ConsumerTransactionConfiguration {
> >boolean isFilterReceivedMessagesEnabled = false;
> > }
> >
> > ```
> >
> > if the design of startMessageId can provide the feature,
> > we can remove the configuration, or currently has a startMessageId
> > closed loop solution, I agree to use startMessageId.
> >
> > As for the reset cursor, I think it is another problem,
> > not related to this PIP.
> >
> > Thanks,
> > Bo
> >
> > 丛搏  于2023年3月24日周五 18:53写道:
> > >
> > > Hi, Michael:
> > >
> > > I thought about it carefully, and using 'startMessageId'
> > > is indeed a good idea. But it is more complicated, we
> > > need to ensure its absolute correctness, and take
> > > performance into consideration. If you can come up
> > >  with a closed-loop solution based on 'startMessageId',
> > > I support you. If it can't take into account performance
> > > and correctness, I think we will make a combination of
> > > our two solutions. You are responsible for ensuring that
> > > a certain degree of messages are not re-delivered, which
> > >  reduces the overhead caused by the repeated delivery
> > > of many messages. My design is responsible for
> > > the final consistency.
> > >
> > > Thanks,
> > > Bo
> > >
> > > Michael Marshall  于2023年3月22日周三 14:22写道:
> > > >
> > > > Because we already send the `startMessageId`, there is a chance where
> > > > we might not even need to update the protocol for the
> > > > CommandSubscribe. In light of that, I quickly put together a PR
> > > > showing how that field might be used to inform the broker where to
> > > > start the read position for the cursor.
> > > >
> > > > https://github.com/apache/pulsar/pull/19892
> > > >
>

Re: [VOTE] PIP-254: Support configuring client version with a description suffix

2023-03-29 Thread
+1 (binding)

Thanks,
Bo

Lin Lin  于2023年3月27日周一 17:49写道:
>
> +1
>
> Thanks,
> Lin Lin
>
> On 2023/03/15 07:54:20 Yunze Xu wrote:
> > Hi all,
> >
> > This thread is to start the vote for PIP-254.
> >
> > Discussion thread:
> > https://lists.apache.org/thread/65cf7w76tt23sbsjnr8rpfxqf1nt9s9l
> >
> > PIP link: https://github.com/apache/pulsar/issues/19705
> >
> > Thanks,
> > Yunze
> >


Re: [DISCUSS] PIP-260: Client consumer filter received messages

2023-03-28 Thread
Hi, all :

Thanks to everyone who discussed it.

Our current care points include the following aspects:

1. The filtering efficiency of the client consumer is not as
good as doing something directly in startMessageId
2. Does not support reset cursor

Because my previous PIP description is to add configuration
in consumerBuilder. The definition of this configuration is not
clear, and it will cause great trouble to users.

We can add a separate configuration that is only used for
acks with transactions. Simple example:

```
ConsumerBuilder transactionConfiguration(ConsumerTransactionConfiguration);

@Builder
@Data
@NoArgsConstructor
@AllArgsConstructor
@InterfaceAudience.Public
@InterfaceStability.Stable

public class ConsumerTransactionConfiguration {
   boolean isFilterReceivedMessagesEnabled = false;
}

```

if the design of startMessageId can provide the feature,
we can remove the configuration, or currently has a startMessageId
closed loop solution, I agree to use startMessageId.

As for the reset cursor, I think it is another problem,
not related to this PIP.

Thanks,
Bo

丛搏  于2023年3月24日周五 18:53写道:
>
> Hi, Michael:
>
> I thought about it carefully, and using 'startMessageId'
> is indeed a good idea. But it is more complicated, we
> need to ensure its absolute correctness, and take
> performance into consideration. If you can come up
>  with a closed-loop solution based on 'startMessageId',
> I support you. If it can't take into account performance
> and correctness, I think we will make a combination of
> our two solutions. You are responsible for ensuring that
> a certain degree of messages are not re-delivered, which
>  reduces the overhead caused by the repeated delivery
> of many messages. My design is responsible for
> the final consistency.
>
> Thanks,
> Bo
>
> Michael Marshall  于2023年3月22日周三 14:22写道:
> >
> > Because we already send the `startMessageId`, there is a chance where
> > we might not even need to update the protocol for the
> > CommandSubscribe. In light of that, I quickly put together a PR
> > showing how that field might be used to inform the broker where to
> > start the read position for the cursor.
> >
> > https://github.com/apache/pulsar/pull/19892
> >
> > The PR is not complete, but it does convey the general idea. I wrote
> > additional details in the draft's description.
> >
> > Thanks,
> > Michael
> >
> > On Tue, Mar 21, 2023 at 11:31 PM Michael Marshall  
> > wrote:
> > >
> > > I am not following your objections to the protocol solution. It might
> > > be more productive if I provided a draft PR with a sample
> > > implementation. I'm not sure that I'll have time, but I'll try to put
> > > something together this week.
> > >
> > > > At least it will simplify the process of using cumulative ack with the
> > > > transaction.
> > >
> > > Is this the underlying motivation for the PIP?
> > >
> > > From my perspective, the PIP is seeking to decrease duplicate messages
> > > experienced due to disconnections from the broker.
> > >
> > > > The problem of the resetting cursor can be optimized in the future
> > >
> > > Why should we push off solving this problem? It seems fundamental to
> > > this PIP and should not be ignored. At the very least, I think we need
> > > to have an idea of what the future solution would be before we defer
> > > its implementation.
> > >
> > > Thanks,
> > > Michael
> > >
> > >
> > > On Tue, Mar 21, 2023 at 10:52 PM 丛搏  wrote:
> > > >
> > > > Hi, Michael
> > > > > In this case, the consumer does not have the source of truth for the
> > > > > readPosition. It would leave the new protocol field for `readPosition`
> > > > > empty and the broker would use its source of truth for the read
> > > > > position.
> > > > application has received all the messages by application thread. we 
> > > > also need a
> > > > correct `startPosition`, right? but in your way, we will think about
> > > > the consumer
> > > > hasn't received any messages.
> > > >
> > > > >
> > > > > > why do we need to invoke `BlockingQueue.take` and `synchronized` in 
> > > > > > the
> > > > > > same logic? it's a bad code.
> > > > >
> > > > > We don't need to synchronize this code here because the logic will
> > > > > come after the consumer has been disconnected from broker a and before
> > > > > it is connected t

Re: [ANNOUNCE] Qiang Zhao as new PMC member in Apache Pulsar

2023-03-28 Thread
Congrats!

Thanks,
Bo

Baodi Shi  于2023年3月29日周三 11:29写道:
>
>  Congrats! Qiang.
>
>
> Thanks,
> Baodi Shi
>
>
> 在 2023年3月29日 11:27:58 上,Horizon <1060026...@qq.com.invalid> 写道:
>
> > Congrats!
> >
> > 2023年3月29日 11:24,tison  写道:
> >
> >
> > Congrats!
> >
> >
> >


Re: [DISCUSS] Cherry-pick #15121 into branch-2.10 to solve the issue sasl authentication failure

2023-03-28 Thread
+1 (Solution 2)

Thanks,
Bo

PengHui Li  于2023年3月29日周三 09:47写道:
>
> Looks good to me to make it optional in branch-2.10 since we don't want to
> introduce any break behaviors in the subsequent patch releases.
>
> Thanks,
> Penghui
>
> On Tue, Mar 28, 2023 at 9:39 PM Dezhi Liu  wrote:
>
> > I agree with cherry-picking PR 15121 to branch-2.10 and keep compatibility.
> >
> >
> > Best,
> > Dezhi
> >
> > On 2023/03/28 11:52:06 Yubiao Feng wrote:
> > > Hi community
> > >
> > > ### Summary
> > > The Admin client (`pulsar-admin`) and Java Client (PulsarAdmin) will
> > throw
> > > Unauthorized Ex in both scenarios:
> > > - If there have more than one broker in a cluster( see issue 1 below ).
> > > - If authentication is enabled for both Pulsar-Proxy and Pulsar-Broker(
> > see
> > > issue 2 below),
> > >
> > > ```
> > > bin/pulsar-admin topics stats persistent://public/default/tp1
> > > 2023-03-28T07:30:58,453+ [main] INFO
> > > org.apache.pulsar.client.impl.auth.AuthenticationSasl - JAAS loginContext
> > > is: PulsarAdmin.
> > > 2023-03-28T07:30:58,583+ [main] INFO
> > > org.apache.pulsar.common.sasl.JAASCredentialsContainer - successfully
> > > logged in.
> > > 2023-03-28T07:30:58,587+ [pulsar-tgt-refresh-thread] INFO
> > > org.apache.pulsar.common.sasl.TGTRefreshThread - TGT refresh thread
> > started.
> > > 2023-03-28T07:30:58,612+ [pulsar-tgt-refresh-thread] INFO
> > > org.apache.pulsar.common.sasl.TGTRefreshThread - Client principal is "
> > > pulsar-ad...@sn.io".
> > > 2023-03-28T07:30:58,613+ [pulsar-tgt-refresh-thread] INFO
> > > org.apache.pulsar.common.sasl.TGTRefreshThread - Server principal is
> > > "krbtgt/sn...@sn.io".
> > > 2023-03-28T07:30:58,617+ [pulsar-tgt-refresh-thread] INFO
> > > org.apache.pulsar.common.sasl.TGTRefreshThread - TGT valid starting at:
> > > Tue Mar 28 07:30:58 UTC 2023
> > > 2023-03-28T07:30:58,617+ [pulsar-tgt-refresh-thread] INFO
> > > org.apache.pulsar.common.sasl.TGTRefreshThread - TGT expires:
> > > Wed Mar 29 07:30:58 UTC 2023
> > > 2023-03-28T07:30:58,617+ [pulsar-tgt-refresh-thread] INFO
> > > org.apache.pulsar.common.sasl.TGTRefreshThread - TGT refresh sleeping
> > > until: Wed Mar 29 03:12:29 UTC 2023
> > > 2023-03-28T07:30:59,861+ [main] INFO
> > > org.apache.pulsar.client.impl.auth.PulsarSaslClient - Using
> > > JAAS/SASL/GSSAPI auth to connect to server Principal broker/pulsar03,
> > > HTTP 401 Unauthorized
> > > Reason: HTTP 401 Unauthorized
> > > ```
> > >
> > > And I want to cherry-pick https://github.com/apache/pulsar/pull/15121
> > into
> > > branch-2.10 to fix it.
> > >
> > > ### Background
> > > When using Kerberos for authentication, Pulsar works like this:
> > > - client: init ticket
> > > - request to broker
> > > - broker identifies the client (Broker can confirm the ticket is valid by
> > > Kerberos)
> > > - sends a token(we call it sasl_role_token) to the client ( at this
> > moment,
> > > the session is successfully created )
> > > - then the client will be authenticated through sasl_role_token, do not
> > use
> > > Kerberos anymore.
> > >
> > > The `sasl_role_token` is generated by this logic: `Sha512(saslRoleName,
> > > ${secret})`, we call the `secret` sasl_sign_secret.
> > > In version `2.10.x`, the variable `secret` is a random string initialized
> > > when the broker starts.
> > >
> > > ### Issue 1
> > > If a cluster includes two brokers, and a topic `public/default/tp1` is
> > > owned by broker-0. We will get an error when we call `pulsar-admin topics
> > > stats public/default/tp1` to broker-1.
> > >
> > > The whole process goes like this:
> > > - client succeeds in authentication and gets a token from broker-1
> > > - broker-1 tells the client to redirect to broker-0
> > > - client request to broker-0 carries the sasl_role_token generated by
> > > broker-1
> > > - broker-0 can not decode the sasl_role_token, because it has differ
> > secret
> > > of broker-1, and responses 401
> > >
> > > ### Issue 2
> > > After authentication is enabled for both Pulsar-Proxy and Pulsar-Broker,
> > > the error occurs as follows
> > > - client succeeds in authentication and gets a token from Pulsar Proxy
> > > - proxy forwards the request to broker
> > > - the broker can not decode the `sasl_role_token`, because it has
> > differed
> > > secret of Pulsar Proxy, and responses 401
> > >
> > > ### solutions
> > > There have two solutions to solve this issue:
> > >
> > > Solution 1
> > > - The client saves different tokens for different servers(e.g.
> > ["broker-0",
> > > "broker-1", "pulsar-proxy"]) so servers will receive the tokens issued by
> > > each other, then we can fix Issue 1.
> > > - Proxy and Broker do not enable authentication simultaneously, then we
> > can
> > > fix Issue 2.
> > >
> > > Solution 2
> > > - Make `sasl_sign_secret` configurable. Users can configure this variable
> > > to the same value, then multi servers can decode every
> > > `sasl_role_token.`  PR #15121 does this.
> > >
> > > I'd prefer 

Re: [VOTE] Pulsar Release 2.9.5 Candidate 2

2023-03-28 Thread
+1 (binding)

os: mac 12.3.1, Intel
java: OpenJDK 17.0.1

- Checked the signature
- Checked LICENSE
- Start standalone
- Publish and consume messages
- Verified Function and State Function
- Verified Cassandra connector
- Build from the source package (maven 3.8.6, openJDK 17.0.1)
- Run a simple transaction check

Thanks,
Bo

Cong Zhao  于2023年3月27日周一 23:09写道:
>
> This is the third release candidate for Apache Pulsar, version 2.9.5.
>
> This release contains 103 commits by 30 contributors.
> https://github.com/apache/pulsar/compare/v2.9.4...v2.9.5-candidate-2
>
> *** Please download, test, and vote on this release. This vote will stay
> open
> for at least 72 hours ***
>
> Note that we are voting upon the source (tag), binaries are provided for
> convenience.
>
> Source and binary files:
> https://dist.apache.org/repos/dist/dev/pulsar/pulsar-2.9.5-candidate-2/
>
> SHA-512 checksums:
> 5e1d0c1b38441cdcb36a2f4e59ab9755b39a5c4a0136e078e91ab9bc2169016f195268692cafd6f13a45248dba2e97959b41f3cfbc8659e3cbd0bade0c954998
> apache-pulsar-2.9.5-bin.tar.gz
>
> 72c9f47005636c6e629dd5117b15fdc13bfd9c7efe107be77a9d55b7dfcdda2f941003eb120ea8beeffe44c41bc41c385a2e5a9cb6540d2fe83a6d04ea53a7389d
> apache-pulsar-2.9.5-src.tar.gz
>
> 73d286af64e189cf91c0511d360d98371b7ade1eec67bc6acf6ff766784e9e40388d3da8ae99a206369feaf398b254fff36e2206077041c37b8055ee7edde86eea
> apache-pulsar-offloaders-2.9.5-bin.tar.gz
>
> Maven staging repo:
> https://repository.apache.org/content/repositories/orgapachepulsar-1222/
>
> The tag to be voted upon:
> v2.9.5-candidate-2 (c75c811ee48f51cf74f399f5b364bc1527186b34)
> https://github.com/apache/pulsar/releases/tag/v2.9.5-candidate-2
>
> Pulsar's KEYS file containing PGP keys you use to sign the release:
> https://downloads.apache.org/pulsar/KEYS
>
> Docker images:
>
> 
> https://hub.docker.com/layers/czcoder/pulsar/2.9.5/images/sha256-c6d3435d5699cb3697ee2ddc4f8a45e0ac5e35d8aefd557e280b7cf91366b981?context=explore
>
> 
> https://hub.docker.com/layers/czcoder/pulsar-all/2.9.5/images/sha256-a09a8e177ca7856c29dc8b9828cd293c6d44b473add20d1877bb3137b94a20c5?context=explore
>
> 
> https://hub.docker.com/layers/czcoder/pulsar-grafana/2.9.5/images/sha256-c43a489c65cf6c407d6c3be6fc7a001227805b1aaa9413115cf55ba11a1e329f?context=explore
>
> Please download the source package, and follow the README to build
> and run the Pulsar standalone service.
>
> Thanks
> Cong Zhao


Re: [VOTE] Pulsar Release 2.11.1 Candidate-1

2023-03-27 Thread
+1 (binding)

system: mac os 12.6, Apple M1
maven: 3.8.5
java: OpenJDK 17.0.3

- Checked the signature
- Checked LICENSE
- Start standalone with zookeeper stream storage
- Publish and consume messages
- Verified Function and State Function
- Verified Cassandra connector
- Build from the source package
- Run a simple transaction check

Thanks,
Bo

Yunze Xu  于2023年3月27日周一 12:01写道:
>
> +1 (binding)
>
> - Checked the signature
> - Build from source (Java 17, Ubuntu 20.04 WSL2)
> - Start standalone with both RocksDB and ZooKeeper
> - Run basic end-to-end Pulsar tests and topic operations via pulsar-shell
> - Run basic end-to-end Kafka tests with KoP 2.11.0.4
>
> Thanks,
> Yunze
>
> On Wed, Mar 22, 2023 at 4:51 PM guo jiwei  wrote:
> >
> > This is the first release candidate for Apache Pulsar, version 2.11.1.
> >
> > This release contains 188 commits by 53 contributors.
> > https://github.com/apache/pulsar/compare/v2.11.0...v2.11.1-candidate-1
> >
> > CI for this release candidate
> > https://github.com/Technoboy-/pulsar/pull/28
> >
> > *** Please download, test and vote on this release. This vote will stay open
> > for at least 72 hours ***
> >
> > Note that we are voting upon the source (tag), binaries are provided for
> > convenience.
> >
> > Source and binary files:
> > https://dist.apache.org/repos/dist/dev/pulsar/pulsar-2.11.1-candidate-1
> >
> > SHA-512 checksums:
> >
> > 7bd5c432fdb888dfcc2a1595efe29206545db535ea996aa4d1ff851e957cc88ce4b54fce912dde84baf8ed40217149a0acffad6c49f02f348721350e5ae895dd
> >
> >  ./apache-pulsar-2.11.1-bin.tar.gz
> >
> >
> > 9e7bfac98e57a2a61216da77e48843bc4274828c9da1e695538d92a3ee929b52c4b0d2280feb73980d77f02c03f3c2dbc797673e69df19190ffee8e46760f305
> >
> >  ./apache-pulsar-2.11.1-src.tar.gz
> >
> > Maven staging repo:
> > https://repository.apache.org/content/repositories/orgapachepulsar-1220/
> >
> > The tag to be voted upon:
> > v2.11.1-candidate-1 (7cc41d7dec415acfeb1f96b68faaa2a80440e070)
> > https://github.com/apache/pulsar/releases/tag/v2.11.1-candidate-1
> >
> > Pulsar's KEYS file containing PGP keys we use to sign the release:
> > https://dist.apache.org/repos/dist/dev/pulsar/KEYS
> >
> > https://hub.docker.com/layers/mattison/pulsar-all/2.11.1-rc1/images/sha256-3d2c0bacbc6d34fa656466370744dedfd2b5a79df7a1178ea2fad27a6d6a
> >
> > https://hub.docker.com/layers/mattison/pulsar/2.11.1-rc1/images/sha256-1fda65d5637ad579ee36ab09448eb6ff3831583a9ef5b543bae78a565813cd95
> >
> > Please download the source package, and follow the
> > release-candidate-validation doc to build
> > and run the Pulsar standalone service.
> > https://pulsar.apache.org/contribute/validate-release-candidate
> >
> > Since the metadata store is changed from ZK to RocksDB, the
> > verification of the `stateful functions` needs to set the parameter
> > "export PULSAR_STANDALONE_USE_ZOOKEEPER=1"
> >
> >
> > Regards
> > Jiwei Guo (Tboy)


Re: [VOTE] Pulsar Release 2.10.4 Candidate 3

2023-03-27 Thread
+1(binding)

Please ignore the previous reply, I read it wrong, sorry.

+1 (binding)

os: mac 12.3.1, Intel
java: OpenJDK 17.0.1

- Checked the signature
- Checked LICENSE
- Start standalone
- Publish and consume messages
- Verified Function and State Function
- Verified Cassandra connector
- Build from the source package (maven 3.8.6, openJDK 11.0.12)
- Run a simple transaction check

Thanks,
Bo


丛搏  于2023年3月27日周一 18:20写道:

>
> Hi, Xiangying:
>
> I found in
> https://dist.apache.org/repos/dist/dev/pulsar/pulsar-2.10.4-candidate-3/connectors/
> has two pulsar-io-debezium-mssql-2.10.4.nar
>
> Thanks,
> Bo
>
> guo jiwei  于2023年3月23日周四 18:27写道:
> >
> > +1 (binding)
> >
> > - Checked the signature
> > - Checked LICENSE
> > - Publish and consume messages
> > - Verified Function and State Function
> > - Verified Cassandra connector
> > - Build from the source package
> >
> >
> > Regards
> > Jiwei Guo (Tboy)
> >
> > On Wed, Mar 22, 2023 at 3:08 PM Xiangying Meng  wrote:
> > >
> > > This is the third release candidate for Apache Pulsar, version 2.10.4.
> > >
> > > This release contains 111 commits by 35 contributors.
> > > https://github.com/apache/pulsar/compare/v2.10.3...v2.10.4-candidate-3
> > >
> > > *** Please download, test and vote on this release. This vote will stay 
> > > open
> > > for at least 72 hours ***
> > >
> > > Note that we are voting upon the source (tag), binaries are provided for
> > > convenience.
> > >
> > > Source and binary files:
> > > https://dist.apache.org/repos/dist/dev/pulsar/pulsar-2.10.4-candidate-3/
> > >
> > > SHA-512 checksums:
> > > 59f0326643cca9ef16b45b4b522ab5a1c1d8dc32ac19897704f8231f9bd4cef02af722848646332db461a807daacc9cb87993b81dcf1429b1f23e3872a32
> > >  apache-pulsar-2.10.4-bin.tar.gz
> > > 5b2adbf0d371b79b1dbe141f152848049d19924151fa8827057038d81833accd70cf67429cb003aedb8d44ee705ed0609d49757e74fed377dce77b09d49062e3
> > >  apache-pulsar-2.10.4-src.tar.gz
> > >
> > > Maven staging repo:
> > > https://repository.apache.org/content/repositories/orgapachepulsar-1221/
> > >
> > > The tag to be voted upon:
> > > v2.10.4-candidate-3
> > > (e4898ac8eb37f698f29aa21e40a3abdda5489d45)
> > > https://github.com/apache/pulsar/releases/tag/v2.10.4-candidate-3
> > >
> > > Pulsar's KEYS file containing PGP keys you use to sign the release:
> > > https://downloads.apache.org/pulsar/KEYS
> > >
> > > Docker images:
> > >
> > > 
> > > https://hub.docker.com/layers/xiangyingmeng/pulsar/2.10.4/images/sha256-05bfb482c5b5aa66ac818651d8997745ac7d536ca0cb56bff8199a6de459ac45?context=repo
> > >
> > > 
> > > https://hub.docker.com/layers/xiangyingmeng/pulsar-all/2.10.4/images/sha256-d4f3de64a8ec4a9039ac500bbf4a0efae9a9f1d4e0a58e11cab020276dc5e6b3?context=repo
> > >
> > > Please download the source package, and follow the README to build
> > > and run the Pulsar standalone service.


Re: [VOTE] Pulsar Release 2.10.4 Candidate 3

2023-03-27 Thread
Hi, Xiangying:

I found in
https://dist.apache.org/repos/dist/dev/pulsar/pulsar-2.10.4-candidate-3/connectors/
has two pulsar-io-debezium-mssql-2.10.4.nar

Thanks,
Bo

guo jiwei  于2023年3月23日周四 18:27写道:
>
> +1 (binding)
>
> - Checked the signature
> - Checked LICENSE
> - Publish and consume messages
> - Verified Function and State Function
> - Verified Cassandra connector
> - Build from the source package
>
>
> Regards
> Jiwei Guo (Tboy)
>
> On Wed, Mar 22, 2023 at 3:08 PM Xiangying Meng  wrote:
> >
> > This is the third release candidate for Apache Pulsar, version 2.10.4.
> >
> > This release contains 111 commits by 35 contributors.
> > https://github.com/apache/pulsar/compare/v2.10.3...v2.10.4-candidate-3
> >
> > *** Please download, test and vote on this release. This vote will stay open
> > for at least 72 hours ***
> >
> > Note that we are voting upon the source (tag), binaries are provided for
> > convenience.
> >
> > Source and binary files:
> > https://dist.apache.org/repos/dist/dev/pulsar/pulsar-2.10.4-candidate-3/
> >
> > SHA-512 checksums:
> > 59f0326643cca9ef16b45b4b522ab5a1c1d8dc32ac19897704f8231f9bd4cef02af722848646332db461a807daacc9cb87993b81dcf1429b1f23e3872a32
> >  apache-pulsar-2.10.4-bin.tar.gz
> > 5b2adbf0d371b79b1dbe141f152848049d19924151fa8827057038d81833accd70cf67429cb003aedb8d44ee705ed0609d49757e74fed377dce77b09d49062e3
> >  apache-pulsar-2.10.4-src.tar.gz
> >
> > Maven staging repo:
> > https://repository.apache.org/content/repositories/orgapachepulsar-1221/
> >
> > The tag to be voted upon:
> > v2.10.4-candidate-3
> > (e4898ac8eb37f698f29aa21e40a3abdda5489d45)
> > https://github.com/apache/pulsar/releases/tag/v2.10.4-candidate-3
> >
> > Pulsar's KEYS file containing PGP keys you use to sign the release:
> > https://downloads.apache.org/pulsar/KEYS
> >
> > Docker images:
> >
> > 
> > https://hub.docker.com/layers/xiangyingmeng/pulsar/2.10.4/images/sha256-05bfb482c5b5aa66ac818651d8997745ac7d536ca0cb56bff8199a6de459ac45?context=repo
> >
> > 
> > https://hub.docker.com/layers/xiangyingmeng/pulsar-all/2.10.4/images/sha256-d4f3de64a8ec4a9039ac500bbf4a0efae9a9f1d4e0a58e11cab020276dc5e6b3?context=repo
> >
> > Please download the source package, and follow the README to build
> > and run the Pulsar standalone service.


Re: [DISCUSS] PIP-260: Client consumer filter received messages

2023-03-24 Thread
Hi, Michael:

I thought about it carefully, and using 'startMessageId'
is indeed a good idea. But it is more complicated, we
need to ensure its absolute correctness, and take
performance into consideration. If you can come up
 with a closed-loop solution based on 'startMessageId',
I support you. If it can't take into account performance
and correctness, I think we will make a combination of
our two solutions. You are responsible for ensuring that
a certain degree of messages are not re-delivered, which
 reduces the overhead caused by the repeated delivery
of many messages. My design is responsible for
the final consistency.

Thanks,
Bo

Michael Marshall  于2023年3月22日周三 14:22写道:
>
> Because we already send the `startMessageId`, there is a chance where
> we might not even need to update the protocol for the
> CommandSubscribe. In light of that, I quickly put together a PR
> showing how that field might be used to inform the broker where to
> start the read position for the cursor.
>
> https://github.com/apache/pulsar/pull/19892
>
> The PR is not complete, but it does convey the general idea. I wrote
> additional details in the draft's description.
>
> Thanks,
> Michael
>
> On Tue, Mar 21, 2023 at 11:31 PM Michael Marshall  
> wrote:
> >
> > I am not following your objections to the protocol solution. It might
> > be more productive if I provided a draft PR with a sample
> > implementation. I'm not sure that I'll have time, but I'll try to put
> > something together this week.
> >
> > > At least it will simplify the process of using cumulative ack with the
> > > transaction.
> >
> > Is this the underlying motivation for the PIP?
> >
> > From my perspective, the PIP is seeking to decrease duplicate messages
> > experienced due to disconnections from the broker.
> >
> > > The problem of the resetting cursor can be optimized in the future
> >
> > Why should we push off solving this problem? It seems fundamental to
> > this PIP and should not be ignored. At the very least, I think we need
> > to have an idea of what the future solution would be before we defer
> > its implementation.
> >
> > Thanks,
> > Michael
> >
> >
> > On Tue, Mar 21, 2023 at 10:52 PM 丛搏  wrote:
> > >
> > > Hi, Michael
> > > > In this case, the consumer does not have the source of truth for the
> > > > readPosition. It would leave the new protocol field for `readPosition`
> > > > empty and the broker would use its source of truth for the read
> > > > position.
> > > application has received all the messages by application thread. we also 
> > > need a
> > > correct `startPosition`, right? but in your way, we will think about
> > > the consumer
> > > hasn't received any messages.
> > >
> > > >
> > > > > why do we need to invoke `BlockingQueue.take` and `synchronized` in 
> > > > > the
> > > > > same logic? it's a bad code.
> > > >
> > > > We don't need to synchronize this code here because the logic will
> > > > come after the consumer has been disconnected from broker a and before
> > > > it is connected to broker b.
> > > The application takes a message from the queue then reconnect,
> > > the SubCommond can use the right startPostion? example:
> > > 1. application receives one message with `MessageId = 1`
> > > 2. consumer reconnect discovers the queue is empty, and the
> > > lastDequeMessageId doesn't change.
> > > 3. consumer sends a subcommand with MessageId.earliest, the `MessageId = 
> > > 1`
> > > will redeliver from broker to client consumer, right?
> > >
> > > As we can see in the example, the application also can receive
> > > `MessageId = 1`, right?
> > > > We would not need to lock here because we do not enqueue new messages
> > > > after we've been disconnected from the broker and before we've sent
> > > > CommandSubscribe.
> > > we can see the code [0], the thread has changed.
> > > Where do we guarantee that no new messages will come in?
> > >
> > > >
> > > > Ultimately, I think a protocol solution will yield better results,
> > > > especially since we'll want to implement this feature in the other
> > > > client languages.
> > > The problem of the resetting cursor can be optimized in the future,
> > > but can you ensure the
> > > correctness of all the cases I mentioned above? IMO, if we use my
> > > design, client change,
> > > we don't need 

Re: [DISCUSS] PIP-260: Client consumer filter received messages

2023-03-24 Thread
Hi, PengHui

PengHui Li  于2023年3月24日周五 10:16写道:
>
> Hi Bo,
>
> > Only support Consumer#redeliverUnacknowledgedMessages()
> >
> > If we redeliver individual messages, they will be filtered. Because we
> can't clear the record latest message in the consumer when redelivering
> individual messages. It will make this config unclear, and if every
> redeliver method changes, it will bring a lot of redundant code, which is >
> difficult to maintain. If there is a need in the future, just support it.
>
> I think it's not correct, right? If we redeliver individual messages,
> we should not filter out any messages.
>
> And it also will be an issue for redeliverUnacknowledgedMessages()?
> The application received messages from the internal queue of the consumer,
> but the message processing failed due to some temporary failures.
> So, they want to redeliver all the received messages and try again.
> In this case, we should not filter the messages.
only redeliverUnacknowledgedMessages() will clear the
lastDequeMessage. so only using redeliverUnacknowledgedMessages() will
not filter the messages. any other individual redeliver will not clear
lastDequeMessage, so the messages also will be filtered. support individual
redeliver filters, It is difficult to implement and does not meet the cumulative
ack scenario.

>
> And for a failover subscription. The active consumer might be changed after
> the disconnection. We can't ensure the consumer will not receive
> duplicate messages, right? If yes, we should mention it in the proposal.

good point!
Yes, I will add it to the proposal.

Thanks,
Bo
>
> Thanks,
> Penghui
>
> On Wed, Mar 22, 2023 at 3:28 PM 丛搏  wrote:
>
> > Hi, Michael:
> >
> > >
> > > Is this the underlying motivation for the PIP?
> > >
> > > From my perspective, the PIP is seeking to decrease duplicate messages
> > > experienced due to disconnections from the broker.
> > This PIP is not aimed at reducing message duplication but at
> > completely preventing it.
> > Therefore we have to consider all edge cases, including redeliver,
> > reconnection, and receive.
> > We must fully guarantee that the message will not be received repeatedly.
> > >
> > > > The problem of the resetting cursor can be optimized in the future
> > >
> > > Why should we push off solving this problem? It seems fundamental to
> > > this PIP and should not be ignored. At the very least, I think we need
> > > to have an idea of what the future solution would be before we defer
> > > its implementation.
> > the reset cursor is difficult to guarantee correctness. when we use
> > cumulative ack,
> > We may generate some errors, after reset cursor then client consumer
> > cumulative ack,
> > we will lose some messages.
> > Here are more edge cases, I just want to make the problem simple,
> > at least under the transaction I think it is very good, the current
> > design will not make mistakes, and it will not affect your current
> > thinking.
> > You can do what you want, I can do what I want.
> >
> > Thanks,
> > Bo
> > >
> > > Thanks,
> > > Michael
> > >
> > >
> > > On Tue, Mar 21, 2023 at 10:52 PM 丛搏  wrote:
> > > >
> > > > Hi, Michael
> > > > > In this case, the consumer does not have the source of truth for the
> > > > > readPosition. It would leave the new protocol field for
> > `readPosition`
> > > > > empty and the broker would use its source of truth for the read
> > > > > position.
> > > > application has received all the messages by application thread. we
> > also need a
> > > > correct `startPosition`, right? but in your way, we will think about
> > > > the consumer
> > > > hasn't received any messages.
> > > >
> > > > >
> > > > > > why do we need to invoke `BlockingQueue.take` and `synchronized`
> > in the
> > > > > > same logic? it's a bad code.
> > > > >
> > > > > We don't need to synchronize this code here because the logic will
> > > > > come after the consumer has been disconnected from broker a and
> > before
> > > > > it is connected to broker b.
> > > > The application takes a message from the queue then reconnect,
> > > > the SubCommond can use the right startPostion? example:
> > > > 1. application receives one message with `MessageId = 1`
> > > > 2. consumer reconnect discovers the queue is empty, and the
> > > > la

Re: [DISCUSS] PIP-260: Client consumer filter received messages

2023-03-24 Thread
Hi, Yunze :

>
> > At least it will simplify the process of using cumulative ack with the 
> > transaction.
>
> So the transaction use cases assume the cursor will never be reset by
> other applications? We cannot ignore the impact brought by this
> proposal that resetting the cursor could have an unexpected effect.

I think the problem of reset cursor and this discussion are two problems,
even if there is no such discussion. There is still a problem
with reset cursor

>
> If the transaction use case is the underlying motivation, you should
> not add the configuration for regular use. Instead, you should explain
> how this feature could simplify the transaction use case, rather than
> pretending to solve a problem for regular scenarios.
>
> And I agree with Michael that the problem of the resetting cursor
> should not be ignored. I've seen many times that someone said "let's
> fix this issue later" and he never went back to the issue.

Transaction is just one aspect. What we want to solve is the problem
that messages cannot be delivered repeatedly, not just transaction
needs.
>
> I'm also curious about whether adding the position in
> `CommandSubscribe` could solve the problem with resetting the cursor.
> I'm wondering if the broker can send the position to all connected
> consumers when a consumer seeks a position or an admin resets the
> cursor. Then consumers can update the internal received position that
> is added in this proposal.
This is difficult to solve. It may require an epoch to be added when
the message is pushed. Simply notifying the client cannot solve the
problem that the consumer has received the message of the application.

Thanks,
Bo
>
> Thanks,
> Yunze
>
> On Fri, Mar 24, 2023 at 10:16 AM PengHui Li  wrote:
> >
> > Hi Bo,
> >
> > > Only support Consumer#redeliverUnacknowledgedMessages()
> > >
> > > If we redeliver individual messages, they will be filtered. Because we
> > can't clear the record latest message in the consumer when redelivering
> > individual messages. It will make this config unclear, and if every
> > redeliver method changes, it will bring a lot of redundant code, which is >
> > difficult to maintain. If there is a need in the future, just support it.
> >
> > I think it's not correct, right? If we redeliver individual messages,
> > we should not filter out any messages.
> >
> > And it also will be an issue for redeliverUnacknowledgedMessages()?
> > The application received messages from the internal queue of the consumer,
> > but the message processing failed due to some temporary failures.
> > So, they want to redeliver all the received messages and try again.
> > In this case, we should not filter the messages.
> >
> > And for a failover subscription. The active consumer might be changed after
> > the disconnection. We can't ensure the consumer will not receive
> > duplicate messages, right? If yes, we should mention it in the proposal.
> >
> > Thanks,
> > Penghui
> >
> > On Wed, Mar 22, 2023 at 3:28 PM 丛搏  wrote:
> >
> > > Hi, Michael:
> > >
> > > >
> > > > Is this the underlying motivation for the PIP?
> > > >
> > > > From my perspective, the PIP is seeking to decrease duplicate messages
> > > > experienced due to disconnections from the broker.
> > > This PIP is not aimed at reducing message duplication but at
> > > completely preventing it.
> > > Therefore we have to consider all edge cases, including redeliver,
> > > reconnection, and receive.
> > > We must fully guarantee that the message will not be received repeatedly.
> > > >
> > > > > The problem of the resetting cursor can be optimized in the future
> > > >
> > > > Why should we push off solving this problem? It seems fundamental to
> > > > this PIP and should not be ignored. At the very least, I think we need
> > > > to have an idea of what the future solution would be before we defer
> > > > its implementation.
> > > the reset cursor is difficult to guarantee correctness. when we use
> > > cumulative ack,
> > > We may generate some errors, after reset cursor then client consumer
> > > cumulative ack,
> > > we will lose some messages.
> > > Here are more edge cases, I just want to make the problem simple,
> > > at least under the transaction I think it is very good, the current
> > > design will not make mistakes, and it will not affect your current
> > > thinking.
> > > You can do what you want, I can do what I want.
> >

Re: [DISCUSS] PIP-260: Client consumer filter received messages

2023-03-22 Thread
Hi, Michael:

>
> Is this the underlying motivation for the PIP?
>
> From my perspective, the PIP is seeking to decrease duplicate messages
> experienced due to disconnections from the broker.
This PIP is not aimed at reducing message duplication but at
completely preventing it.
Therefore we have to consider all edge cases, including redeliver,
reconnection, and receive.
We must fully guarantee that the message will not be received repeatedly.
>
> > The problem of the resetting cursor can be optimized in the future
>
> Why should we push off solving this problem? It seems fundamental to
> this PIP and should not be ignored. At the very least, I think we need
> to have an idea of what the future solution would be before we defer
> its implementation.
the reset cursor is difficult to guarantee correctness. when we use
cumulative ack,
We may generate some errors, after reset cursor then client consumer
cumulative ack,
we will lose some messages.
Here are more edge cases, I just want to make the problem simple,
at least under the transaction I think it is very good, the current
design will not make mistakes, and it will not affect your current thinking.
You can do what you want, I can do what I want.

Thanks,
Bo
>
> Thanks,
> Michael
>
>
> On Tue, Mar 21, 2023 at 10:52 PM 丛搏  wrote:
> >
> > Hi, Michael
> > > In this case, the consumer does not have the source of truth for the
> > > readPosition. It would leave the new protocol field for `readPosition`
> > > empty and the broker would use its source of truth for the read
> > > position.
> > application has received all the messages by application thread. we also 
> > need a
> > correct `startPosition`, right? but in your way, we will think about
> > the consumer
> > hasn't received any messages.
> >
> > >
> > > > why do we need to invoke `BlockingQueue.take` and `synchronized` in the
> > > > same logic? it's a bad code.
> > >
> > > We don't need to synchronize this code here because the logic will
> > > come after the consumer has been disconnected from broker a and before
> > > it is connected to broker b.
> > The application takes a message from the queue then reconnect,
> > the SubCommond can use the right startPostion? example:
> > 1. application receives one message with `MessageId = 1`
> > 2. consumer reconnect discovers the queue is empty, and the
> > lastDequeMessageId doesn't change.
> > 3. consumer sends a subcommand with MessageId.earliest, the `MessageId = 1`
> > will redeliver from broker to client consumer, right?
> >
> > As we can see in the example, the application also can receive
> > `MessageId = 1`, right?
> > > We would not need to lock here because we do not enqueue new messages
> > > after we've been disconnected from the broker and before we've sent
> > > CommandSubscribe.
> > we can see the code [0], the thread has changed.
> > Where do we guarantee that no new messages will come in?
> >
> > >
> > > Ultimately, I think a protocol solution will yield better results,
> > > especially since we'll want to implement this feature in the other
> > > client languages.
> > The problem of the resetting cursor can be optimized in the future,
> > but can you ensure the
> > correctness of all the cases I mentioned above? IMO, if we use my
> > design, client change,
> > we don't need the broker to make any changes. its simple and it's easy
> > to implement.
> > I can make sure it's completely correct, I can make sure it's
> > completely correct. In your design,
> > I currently do not see a closed-loop implementation that can achieve
> > at least in the java client.
> >
> > Thanks,
> > Bo
> > >
> > > Thanks,
> > > Michael
> > >
> > > On Tue, Mar 21, 2023 at 9:29 PM 丛搏  wrote:
> > > >
> > > > Hi, Michael:
> > > >
> > > > Michael Marshall  于2023年3月21日周二 23:17写道:
> > > >
> > > > >
> > > > > One more point. Instead of keeping track of the latest message seen by
> > > > > the application, the logic in my solution would actually just check
> > > > > the last message in the `incomingMessages` queue (as in the most
> > > > > recently added), and use that as the read position in the subscribe
> > > > > command. If we made this change, we would have to change this code [0]
> > > > > to not drop the `incomingMessages` queue.
> > > >
> > > > case 1:
> > > > What we define the message that the

Re: [DISCUSS] PIP-260: Client consumer filter received messages

2023-03-21 Thread
Hi, Michael
> In this case, the consumer does not have the source of truth for the
> readPosition. It would leave the new protocol field for `readPosition`
> empty and the broker would use its source of truth for the read
> position.
application has received all the messages by application thread. we also need a
correct `startPosition`, right? but in your way, we will think about
the consumer
hasn't received any messages.

>
> > why do we need to invoke `BlockingQueue.take` and `synchronized` in the
> > same logic? it's a bad code.
>
> We don't need to synchronize this code here because the logic will
> come after the consumer has been disconnected from broker a and before
> it is connected to broker b.
The application takes a message from the queue then reconnect,
the SubCommond can use the right startPostion? example:
1. application receives one message with `MessageId = 1`
2. consumer reconnect discovers the queue is empty, and the
lastDequeMessageId doesn't change.
3. consumer sends a subcommand with MessageId.earliest, the `MessageId = 1`
will redeliver from broker to client consumer, right?

As we can see in the example, the application also can receive
`MessageId = 1`, right?
> We would not need to lock here because we do not enqueue new messages
> after we've been disconnected from the broker and before we've sent
> CommandSubscribe.
we can see the code [0], the thread has changed.
Where do we guarantee that no new messages will come in?

>
> Ultimately, I think a protocol solution will yield better results,
> especially since we'll want to implement this feature in the other
> client languages.
The problem of the resetting cursor can be optimized in the future,
but can you ensure the
correctness of all the cases I mentioned above? IMO, if we use my
design, client change,
we don't need the broker to make any changes. its simple and it's easy
to implement.
I can make sure it's completely correct, I can make sure it's
completely correct. In your design,
I currently do not see a closed-loop implementation that can achieve
at least in the java client.

Thanks,
Bo
>
> Thanks,
> Michael
>
> On Tue, Mar 21, 2023 at 9:29 PM 丛搏  wrote:
> >
> > Hi, Michael:
> >
> > Michael Marshall  于2023年3月21日周二 23:17写道:
> >
> > >
> > > One more point. Instead of keeping track of the latest message seen by
> > > the application, the logic in my solution would actually just check
> > > the last message in the `incomingMessages` queue (as in the most
> > > recently added), and use that as the read position in the subscribe
> > > command. If we made this change, we would have to change this code [0]
> > > to not drop the `incomingMessages` queue.
> >
> > case 1:
> > What we define the message that the application has seen?
> > I think it is the[0], when the `incomingMessages` queue is empty,
> > how do we get the correct `startPosition`?
> > What I think we should lock the receive logic in [1]
> > ```
> > synchronized (this) {
> > message = incomingMessages.take();
> > messageProcessed(message);
> > }
> > ```
> > why do we need to invoke `BlockingQueue.take` and `synchronized` in the
> > same logic? it's a bad code.
> >
> > case 2:
> > If we sub with `startMessageId`, we also should lock any enqueue
> > logic, like [2] and
> > check to consumer's current state
> > ```
> > synchronized (this) {
> > if (consumer.isConnected) {
> > if (canEnqueueMessage(message) && incomingMessages.offer(message)) {
> > // After we have enqueued the messages on
> > `incomingMessages` queue, we cannot touch the message
> > // instance anymore, since for pooled messages, this
> > instance was possibly already been released
> > // and recycled.
> > INCOMING_MESSAGES_SIZE_UPDATER.addAndGet(this, messageSize);
> > getMemoryLimitController().ifPresent(limiter ->
> > limiter.forceReserveMemory(messageSize));
> > updateAutoScaleReceiverQueueHint();
> > }
> > }
> > }
> > ```
> > case 3:
> > when we subcommand sends to broker with `startMessageId = 1`, then the
> > broker push message
> > has not yet entered `incommingQueue`, the application invokes
> > redeliver. in this way, we don't
> > filter messages are correct, right?
> >
> > These are some cases that I simply thought of, and there must be
> > others that I haven't thought
> > of. Are you sure we can handle these problems correctly?
> >
> > > The problem of "the consumer doesn't know" seems like something that
> > 

Re: [DISCUSS] PIP-260: Client consumer filter received messages

2023-03-21 Thread
Hi, Yunze:

It is true that hiding these details in the SDK still requires users to have a
certain level of understanding of the configuration. However, this approach
can still be helpful for users who want to use the feature but want
to avoid dealing with the nitty-gritty details of implementation.

At least it will simplify the process of using cumulative ack with the
transaction.

Thanks,
Bo

Yunze Xu  于2023年3月22日周三 10:32写道:
>
> I just missed the point that the reset cursor operations do not work
> for the consumer. IIUC, the seek operation does not work as well. Then
> I think the option is not user-friendly as the PIP says:
>
> >  It needs to be enabled with a complete understanding of this configuration.
>
> If users want, they can also record the latest position for each
> consumer at the application side and filter the messages by the public
> `MessageId#compareTo` API. If hiding these details in SDK still
> requires users to know these details, I think it would not be better
> than doing that explicitly in the application.
>
> Thanks,
> Yunze
>
> On Wed, Mar 22, 2023 at 10:29 AM 丛搏  wrote:
> >
> > Hi, Michael:
> >
> > Michael Marshall  于2023年3月21日周二 23:17写道:
> >
> > >
> > > One more point. Instead of keeping track of the latest message seen by
> > > the application, the logic in my solution would actually just check
> > > the last message in the `incomingMessages` queue (as in the most
> > > recently added), and use that as the read position in the subscribe
> > > command. If we made this change, we would have to change this code [0]
> > > to not drop the `incomingMessages` queue.
> >
> > case 1:
> > What we define the message that the application has seen?
> > I think it is the[0], when the `incomingMessages` queue is empty,
> > how do we get the correct `startPosition`?
> > What I think we should lock the receive logic in [1]
> > ```
> > synchronized (this) {
> > message = incomingMessages.take();
> > messageProcessed(message);
> > }
> > ```
> > why do we need to invoke `BlockingQueue.take` and `synchronized` in the
> > same logic? it's a bad code.
> >
> > case 2:
> > If we sub with `startMessageId`, we also should lock any enqueue
> > logic, like [2] and
> > check to consumer's current state
> > ```
> > synchronized (this) {
> > if (consumer.isConnected) {
> > if (canEnqueueMessage(message) && incomingMessages.offer(message)) {
> > // After we have enqueued the messages on
> > `incomingMessages` queue, we cannot touch the message
> > // instance anymore, since for pooled messages, this
> > instance was possibly already been released
> > // and recycled.
> > INCOMING_MESSAGES_SIZE_UPDATER.addAndGet(this, messageSize);
> > getMemoryLimitController().ifPresent(limiter ->
> > limiter.forceReserveMemory(messageSize));
> > updateAutoScaleReceiverQueueHint();
> > }
> > }
> > }
> > ```
> > case 3:
> > when we subcommand sends to broker with `startMessageId = 1`, then the
> > broker push message
> > has not yet entered `incommingQueue`, the application invokes
> > redeliver. in this way, we don't
> > filter messages are correct, right?
> >
> > These are some cases that I simply thought of, and there must be
> > others that I haven't thought
> > of. Are you sure we can handle these problems correctly?
> >
> > > The problem of "the consumer doesn't know" seems like something that
> > > is reasonably within the protocol's responsibilities. In this case, an
> > > event happens on the broker, and the broker can tell the consumer.
> >
> > I don't think a simple change protocol can solve these problems,
> > We can't promise that every consumer can receive the broker reset
> > cursor request.
> > When the consumer reconnects, the broker can't send the reset cursor 
> > request to
> > the client consumers, right? In this case, the consumer is still unaware, 
> > right?
> >
> >
> > [0] 
> > https://github.com/apache/pulsar/blob/30d2469086fea989ac8baf059df8e69c66a68d89/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java#L135
> > [1] 
> > https://github.com/apache/pulsar/blob/30d2469086fea989ac8baf059df8e69c66a68d89/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java#L440-L454
> > [2] 
> > https://github.com/apache/pulsar/blob/30d2469086fea989ac8baf059df8e69c66a68d89/pulsar-client/src/main/java

Re: [DISCUSS] PIP-260: Client consumer filter received messages

2023-03-21 Thread
he consumer doesn't know" seems like something that
> > is reasonably within the protocol's responsibilities. In this case, an
> > event happens on the broker, and the broker can tell the consumer.
> >
> > > * Consumers should close when the server resets the cursor,
> > > * when the cursor reset success, and then restart. Otherwise,
> > > * the consumer will not receive the history messages.
> >
> > This is introducing a confusing edge case that requires reading a
> > Javadoc in order to understand. That seems risky to me, and I do not
> > think we should add such an edge case. A new protocol message would
> > easily handle it and make it transparent to the application.
> >
> > Thanks,
> > Michael
> >
> > [0] 
> > https://github.com/apache/pulsar/blob/af1360fb167c1f9484fda5771df3ea9b21d1440b/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java#L826-L912
> > [1] 
> > https://github.com/apache/pulsar/blob/af1360fb167c1f9484fda5771df3ea9b21d1440b/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java#L1870-L1876
> > [2] 
> > https://github.com/apache/pulsar/blob/af1360fb167c1f9484fda5771df3ea9b21d1440b/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java#L789-L795
> > [3] 
> > https://github.com/apache/pulsar/blob/af1360fb167c1f9484fda5771df3ea9b21d1440b/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java#L922-L960
> >
> > On Tue, Mar 21, 2023 at 8:58 AM Yubiao Feng
> >  wrote:
> > >
> > > +1
> > >
> > > Hi, Bo :
> > >
> > > Thanks for your explanation. That makes sense to me.
> > >
> > > Thanks,
> > > Yubiao Feng
> > >
> > > On Mon, Mar 20, 2023 at 10:21 PM 丛搏  wrote:
> > >
> > > > Hi, pulsar community:
> > > >
> > > > I started a PIP about `Client consumer filter received messages`.
> > > >
> > > > PIP: https://github.com/apache/pulsar/issues/19864
> > > >
> > > > Thanks,
> > > > Bo
> > > >


Re: [DISCUSS] PIP-260: Client consumer filter received messages

2023-03-21 Thread
Hi, Aloys:

Yes, it will work with `consumer.seek()`.
Sorry, I missed this, I will add this description to the PIP.

But the current seek method has some problems, the detail in
https://lists.apache.org/thread/97o9t4ltkds5pfq41l9xbbd31t41qm8w,
I am not sure, does it make sense to support seek method in this PIP.

Thanks,
Bo

Aloys Zhang  于2023年3月21日周二 19:08写道:
>
> Nice proposal.
>
> I'm interested in a point
> >  So when we need to reset the cursor, the client consumer should all be
> closed, and then reset the cursor then restart the consumer.
>
> Does this requirement apply to `consumer.seek`?
> Because in some scenarios, we need to create consumers first and then seek
> a position or timestamp.
>
>
> Yunze Xu  于2023年3月21日周二 17:19写道:
>
> > First, I agree with Yubiao that we can avoid calling the `isDuplicate`
> > method once this option is enabled.
> >
> > Then, I'm wondering in which case would users want to disable this
> > option? What's the disadvantage to disable the option? I think we can
> > just record the latest position (ledger id, entry id, batch index) of
> > the message received if the subscription type is Exclusive or
> > Failover.
> >
> > Is there any breaking change if we just apply this filter without
> > adding a configuration option?
> >
> > Thanks,
> > Yunze
> >
> > On Tue, Mar 21, 2023 at 2:26 PM 丛搏  wrote:
> > >
> > > Hi, Michael
> > >
> > > Michael Marshall  于2023年3月21日周二 13:03写道:
> > > >
> > > > This is a great problem to improve.
> > > >
> > > > What if we instead expand the CommandSubscribe [0] protocol message
> > > > with a new field to represent the client's desired read position? This
> > > > way, the client can tell the second broker where to start sending
> > > > messages, and there is no need to send the messages twice.
> > > >
> > > > I like the protocol expansion because it saves on unnecessary network
> > > > transfer in several places and because it will be more straightforward
> > > > for clients in other languages to implement.
> > > >
> > > > What do you think?
> > > if we add the new field in CommandSubscribe, we should ensure
> > > the synchronization between consumer reconnection and user
> > > calling receive and redeliverUnack method. it will affect the performance
> > > of receive. expose synchronization to hot paths it not a good idea.
> > > Although the message is re-delivered twice, I don't think it
> > > will cause too much performance loss.
> > >
> > > This filtering is rigorous, and there cannot be some race condition
> > problems
> > > because it involves transactions. I want it to be simple and efficient,
> > > and I don't want it to become complicated and difficult to maintain.
> > >
> > > Of course, if the failover and exclusive consumers are changed to pull
> > mode,
> > > I believe that the change protocol is a very good idea. But at present,
> > > there is obviously no sufficient reason to do so.
> > >
> > > Thanks,
> > > Bo
> > >
> > > >
> > > > Thanks,
> > > > Michael
> > > >
> > > > [0]
> > https://github.com/apache/pulsar/blob/af1360fb167c1f9484fda5771df3ea9b21d1440b/pulsar-common/src/main/proto/PulsarApi.proto#L339-L400
> > > >
> > > >
> > > > On Mon, Mar 20, 2023 at 10:56 AM Xiangying Meng 
> > wrote:
> > > > >
> > > > > Hi Congbo,
> > > > > I think this is a great idea.
> > > > > This is more efficient in filtering duplicate messages for a single
> > > > > consumer.
> > > > > And maybe more details about implementation should be shown in the
> > proposal.
> > > > >
> > > > > Best regards,
> > > > > Xiangying
> > > > >
> > > > > On Mon, Mar 20, 2023 at 10:53 PM Yubiao Feng
> > > > >  wrote:
> > > > >
> > > > > > Hi Bo
> > > > > >
> > > > > > I think this is a good way to filter messages that the client has
> > received.
> > > > > >
> > > > > > And I have two questions:
> > > > > >
> > > > > > 1. This is more powerful than the original way
> > > > > > (`acknowledgmentsGroupingTracker.isDuplicate(msgId)) to filter out
> > > > > > duplicated messages.
> > > >

Re: [DISCUSS] PIP-260: Client consumer filter received messages

2023-03-21 Thread
Hi, Yunze:

Yunze Xu  于2023年3月21日周二 17:19写道:
>
>
> Is there any breaking change if we just apply this filter without
> adding a configuration option?

If not add this configuration, the Pulsar Admin reset cursor will cause
the wrong behavior. It will filter the messages which have been reset.

As described in Compatibility in PIP. Client consumer doesn't know
Pulsar Admin reset cursor.
>
> Thanks,
> Yunze
>
> On Tue, Mar 21, 2023 at 2:26 PM 丛搏  wrote:
> >
> > Hi, Michael
> >
> > Michael Marshall  于2023年3月21日周二 13:03写道:
> > >
> > > This is a great problem to improve.
> > >
> > > What if we instead expand the CommandSubscribe [0] protocol message
> > > with a new field to represent the client's desired read position? This
> > > way, the client can tell the second broker where to start sending
> > > messages, and there is no need to send the messages twice.
> > >
> > > I like the protocol expansion because it saves on unnecessary network
> > > transfer in several places and because it will be more straightforward
> > > for clients in other languages to implement.
> > >
> > > What do you think?
> > if we add the new field in CommandSubscribe, we should ensure
> > the synchronization between consumer reconnection and user
> > calling receive and redeliverUnack method. it will affect the performance
> > of receive. expose synchronization to hot paths it not a good idea.
> > Although the message is re-delivered twice, I don't think it
> > will cause too much performance loss.
> >
> > This filtering is rigorous, and there cannot be some race condition problems
> > because it involves transactions. I want it to be simple and efficient,
> > and I don't want it to become complicated and difficult to maintain.
> >
> > Of course, if the failover and exclusive consumers are changed to pull mode,
> > I believe that the change protocol is a very good idea. But at present,
> > there is obviously no sufficient reason to do so.
> >
> > Thanks,
> > Bo
> >
> > >
> > > Thanks,
> > > Michael
> > >
> > > [0] 
> > > https://github.com/apache/pulsar/blob/af1360fb167c1f9484fda5771df3ea9b21d1440b/pulsar-common/src/main/proto/PulsarApi.proto#L339-L400
> > >
> > >
> > > On Mon, Mar 20, 2023 at 10:56 AM Xiangying Meng  
> > > wrote:
> > > >
> > > > Hi Congbo,
> > > > I think this is a great idea.
> > > > This is more efficient in filtering duplicate messages for a single
> > > > consumer.
> > > > And maybe more details about implementation should be shown in the 
> > > > proposal.
> > > >
> > > > Best regards,
> > > > Xiangying
> > > >
> > > > On Mon, Mar 20, 2023 at 10:53 PM Yubiao Feng
> > > >  wrote:
> > > >
> > > > > Hi Bo
> > > > >
> > > > > I think this is a good way to filter messages that the client has 
> > > > > received.
> > > > >
> > > > > And I have two questions:
> > > > >
> > > > > 1. This is more powerful than the original way
> > > > > (`acknowledgmentsGroupingTracker.isDuplicate(msgId)) to filter out
> > > > > duplicated messages.
> > > > >  Is it possible to turn off the original de-replay logic to improve
> > > > > performance after enabling this new feature?
> > > > >
> > > > > 2. There should be a typo in the article
> > > > >
> > > > > > ## Only support Consumer#redeliverUnacknowledgedMessages()
> > > > > > If we redeliver individual messages, they will be filtered. Because 
> > > > > > we
> > > > > can't clear the record latest message
> > > > > >in the consumer when redelivering individual messages. It will make 
> > > > > >this
> > > > > config unclear, and if every redeliver
> > > > > > method changes, it will bring a lot of redundant code, which is 
> > > > > > difficult
> > > > > to maintain. If there is a need in the
> > > > > > future, just support it.
> > > > >
> > > > > I suppose you want to say not support 
> > > > > `redeliverUnacknowledgedMessages`,
> > > > > right?
> > > > >
> > > > >
> > > > > Thanks
> > > > > Yubiao Feng
> > > > >
> > > > > On Mon, Mar 20, 2023 at 10:21 PM 丛搏  wrote:
> > > > >
> > > > > > Hi, pulsar community:
> > > > > >
> > > > > > I started a PIP about `Client consumer filter received messages`.
> > > > > >
> > > > > > PIP: https://github.com/apache/pulsar/issues/19864
> > > > > >
> > > > > > Thanks,
> > > > > > Bo
> > > > > >
> > > > >


Re: [VOTE] PIP-259: Make the config httpMaxRequestHeaderSize of the pulsar web server to configurable

2023-03-21 Thread
+1 (binding)

Thanks,
Bo

Yubiao Feng  于2023年3月20日周一 17:11写道:
>
> Hi Pulsar Community
>
> This thread is to start the vote for PIP 259.
>
> Discussion: https://lists.apache.org/thread/f11cld5cbc8sodhgvs5s28lw8nxsr9dc
> Issue: https://github.com/apache/pulsar/issues/19826
> Implementation: https://github.com/apache/pulsar/pull/19514
>
>
> Voting will stay open for at least 48h.
>
> Thanks,
> Yubiao Feng


Re: [DISCUSS] PIP-260: Client consumer filter received messages

2023-03-21 Thread
Hi, Michael

Michael Marshall  于2023年3月21日周二 13:03写道:
>
> This is a great problem to improve.
>
> What if we instead expand the CommandSubscribe [0] protocol message
> with a new field to represent the client's desired read position? This
> way, the client can tell the second broker where to start sending
> messages, and there is no need to send the messages twice.
>
> I like the protocol expansion because it saves on unnecessary network
> transfer in several places and because it will be more straightforward
> for clients in other languages to implement.
>
> What do you think?
if we add the new field in CommandSubscribe, we should ensure
the synchronization between consumer reconnection and user
calling receive and redeliverUnack method. it will affect the performance
of receive. expose synchronization to hot paths it not a good idea.
Although the message is re-delivered twice, I don't think it
will cause too much performance loss.

This filtering is rigorous, and there cannot be some race condition problems
because it involves transactions. I want it to be simple and efficient,
and I don't want it to become complicated and difficult to maintain.

Of course, if the failover and exclusive consumers are changed to pull mode,
I believe that the change protocol is a very good idea. But at present,
there is obviously no sufficient reason to do so.

Thanks,
Bo

>
> Thanks,
> Michael
>
> [0] 
> https://github.com/apache/pulsar/blob/af1360fb167c1f9484fda5771df3ea9b21d1440b/pulsar-common/src/main/proto/PulsarApi.proto#L339-L400
>
>
> On Mon, Mar 20, 2023 at 10:56 AM Xiangying Meng  wrote:
> >
> > Hi Congbo,
> > I think this is a great idea.
> > This is more efficient in filtering duplicate messages for a single
> > consumer.
> > And maybe more details about implementation should be shown in the proposal.
> >
> > Best regards,
> > Xiangying
> >
> > On Mon, Mar 20, 2023 at 10:53 PM Yubiao Feng
> >  wrote:
> >
> > > Hi Bo
> > >
> > > I think this is a good way to filter messages that the client has 
> > > received.
> > >
> > > And I have two questions:
> > >
> > > 1. This is more powerful than the original way
> > > (`acknowledgmentsGroupingTracker.isDuplicate(msgId)) to filter out
> > > duplicated messages.
> > >  Is it possible to turn off the original de-replay logic to improve
> > > performance after enabling this new feature?
> > >
> > > 2. There should be a typo in the article
> > >
> > > > ## Only support Consumer#redeliverUnacknowledgedMessages()
> > > > If we redeliver individual messages, they will be filtered. Because we
> > > can't clear the record latest message
> > > >in the consumer when redelivering individual messages. It will make this
> > > config unclear, and if every redeliver
> > > > method changes, it will bring a lot of redundant code, which is 
> > > > difficult
> > > to maintain. If there is a need in the
> > > > future, just support it.
> > >
> > > I suppose you want to say not support `redeliverUnacknowledgedMessages`,
> > > right?
> > >
> > >
> > > Thanks
> > > Yubiao Feng
> > >
> > > On Mon, Mar 20, 2023 at 10:21 PM 丛搏  wrote:
> > >
> > > > Hi, pulsar community:
> > > >
> > > > I started a PIP about `Client consumer filter received messages`.
> > > >
> > > > PIP: https://github.com/apache/pulsar/issues/19864
> > > >
> > > > Thanks,
> > > > Bo
> > > >
> > >


Re: [DISCUSS] PIP-260: Client consumer filter received messages

2023-03-20 Thread
Hi, Xiangying

Xiangying Meng  于2023年3月20日周一 23:56写道:
>
> Hi Congbo,
> I think this is a great idea.
> This is more efficient in filtering duplicate messages for a single
> consumer.
> And maybe more details about implementation should be shown in the proposal.

we add the public interface, the code detail needs review in PR not int PIP

>
> Best regards,
> Xiangying
>
> On Mon, Mar 20, 2023 at 10:53 PM Yubiao Feng
>  wrote:
>
> > Hi Bo
> >
> > I think this is a good way to filter messages that the client has received.
> >
> > And I have two questions:
> >
> > 1. This is more powerful than the original way
> > (`acknowledgmentsGroupingTracker.isDuplicate(msgId)) to filter out
> > duplicated messages.
> >  Is it possible to turn off the original de-replay logic to improve
> > performance after enabling this new feature?
> >
> > 2. There should be a typo in the article
> >
> > > ## Only support Consumer#redeliverUnacknowledgedMessages()
> > > If we redeliver individual messages, they will be filtered. Because we
> > can't clear the record latest message
> > >in the consumer when redelivering individual messages. It will make this
> > config unclear, and if every redeliver
> > > method changes, it will bring a lot of redundant code, which is difficult
> > to maintain. If there is a need in the
> > > future, just support it.
> >
> > I suppose you want to say not support `redeliverUnacknowledgedMessages`,
> > right?
> >
> >
> > Thanks
> > Yubiao Feng
> >
> > On Mon, Mar 20, 2023 at 10:21 PM 丛搏  wrote:
> >
> > > Hi, pulsar community:
> > >
> > > I started a PIP about `Client consumer filter received messages`.
> > >
> > > PIP: https://github.com/apache/pulsar/issues/19864
> > >
> > > Thanks,
> > > Bo
> > >
> >


Re: [DISCUSS] PIP-260: Client consumer filter received messages

2023-03-20 Thread
Hi, Yubiao :

Yubiao Feng  于2023年3月20日周一 22:53写道:
>
> Hi Bo
>
> I think this is a good way to filter messages that the client has received.
>
> And I have two questions:
>
> 1. This is more powerful than the original way
> (`acknowledgmentsGroupingTracker.isDuplicate(msgId)) to filter out
> duplicated messages.
>  Is it possible to turn off the original de-replay logic to improve
> performance after enabling this new feature?
>
Good question! Yes, the original
(`acknowledgmentsGroupingTracker.isDuplicate(msgId)) works for
duplicated messages,
we can turn off the original de-replay logic to improve. in this PIP,
only consider the function, this improvement can be done later

> 2. There should be a typo in the article
>
> > ## Only support Consumer#redeliverUnacknowledgedMessages()
> > If we redeliver individual messages, they will be filtered. Because we
> can't clear the record latest message
> >in the consumer when redelivering individual messages. It will make this
> config unclear, and if every redeliver
> > method changes, it will bring a lot of redundant code, which is difficult
> to maintain. If there is a need in the
> > future, just support it.
>
> I suppose you want to say not support `redeliverUnacknowledgedMessages`,
> right?
we only support `redeliverUnacknowledgedMessages` to clear the record
in client consumer,
any other redeliver method doesn't support
>
>
> Thanks
> Yubiao Feng
>
> On Mon, Mar 20, 2023 at 10:21 PM 丛搏  wrote:
>
> > Hi, pulsar community:
> >
> > I started a PIP about `Client consumer filter received messages`.
> >
> > PIP: https://github.com/apache/pulsar/issues/19864
> >
> > Thanks,
> > Bo
> >


[DISCUSS] PIP-260: Client consumer filter received messages

2023-03-20 Thread
Hi, pulsar community:

I started a PIP about `Client consumer filter received messages`.

PIP: https://github.com/apache/pulsar/issues/19864

Thanks,
Bo


Re: [DISCUSS] PIP-259: Make the config httpMaxRequestHeaderSize of the pulsar web server to configurable

2023-03-17 Thread
+1

Hi, Yubiao :

Thanks for your explanation. That makes sense to me.

Thanks,
Bo


Yubiao Feng  于2023年3月17日周五 16:29写道:
>
> Hi Bo
>
> > I have a question, why we need `httpClientRequestBufferSize ` in
> > proxy, can you explain in detail?
>
> Since The Pulsar-Proxy uses the tool `jetty-client` to forward HTTP
> requests from users to The Pulsar-Broker, if the proxy receives a request
> like this:
>
> ```
> GET /admin/v2/public/default/tp.long..long/stats HTTP/1.1
> ```
>
> The internal client with forward this request like this:
>
> ```
> ByteBuf buf = allocate( config.httpClientRequestBufferSize )
> buf.write(requestLine);  // (Highlight) we will get
> a BufferOverflowException if the request line is too long.
> ```
>
> Therefore, in addition to ensuring that the proxy server can receive a long
> request line, the internal client must also process a long request line.
> And this problem can be solved by making configuration
> `httpClientRequestBufferSize` configurable.
>
>
> Thanks
> Yubiao Feng
>
>
> On Thu, Mar 16, 2023 at 8:12 PM 丛搏  wrote:
>
> > hi yubiao :
> >
> > I have a question, why we need `httpClientRequestBufferSize ` in
> > proxy, can you explain in detail?
> >
> > Thanks,
> > Bo
> >
> > Yubiao Feng  于2023年3月16日周四 00:11写道:
> >
> > >
> > > Hi community
> > >
> > > I am starting a DISCUSS for "PIP-259: Make the config
> > > httpMaxRequestHeaderSize of the pulsar web server configurable".
> > >
> > > ### Motivation
> > >
> > > We have two ways to manage pulsar's resources:
> > > - By client API (Can manage some resources, such as `create topic`,
> > `create
> > > subscriber`, and so on)
> > > - By admin API (Can manage all the resources)
> > >
> > > The `client API` has no limit on the request length. And the `admin API`
> > > has a limit on the request length(such as HTTP request line and HTTP
> > > request headers), this restriction is done by the built-in web container
> > > Jetty.
> > >
> > > Almost resources can be created by two APIs, but can only be modified and
> > > deleted by `admin API`. This causes us to be unable to modify or delete
> > > resources created by `client API` with too long a name because it exceeds
> > > Jetty's default HTTP request URI length limit.
> > >
> > > ### Goal
> > >
> > >  1. For web servers
> > > Provide a way to modify Jetty's `httpMaxRequestHeaderSize` configuration
> > > (involves two servers: the web server in pulsar and the web server in
> > > pulsar-proxy)
> > >
> > >  2.For the internal client in pulsar-proxy
> > > Provide a way to modify Jetty-client's `httpClientRequestBufferSize`
> > > configuration.
> > >
> > > Since the pulsar-proxy handles HTTP requests like this: `pulsar-admin.sh`
> > > -> `proxy web server` -> `(highlight) internal client in proxy` ->
> > `pulsar
> > > web server`.
> > >
> > > When the internal client forwards a request, it forwards the request
> > header
> > > and the request body, and all the data passes through a buffer( we call
> > it
> > > Buf ), like this:
> > > - Receive a request
> > > - Put the request line and request headers input to the Buf.
> > > - (highlight)Flush the Buf ( If the data in the request
> > > line and request header exceeds the length of the buf, an error is
> > reported
> > > )
> > > - Put the request body input to the Buf.
> > > - Flush the Buf if it is full.
> > >
> > > So we need a config to set the `buff size` of the Buf:
> > > `pulsar-proxy.conf.httpClientRequestBufferSize` -> `buf size of the
> > > internal client`.
> > >
> > > ### API Changes
> > >
> > >  ServiceConfiguration.java
> > > ```java
> > >@FieldContext(
> > > category = CATEGORY_HTTP,
> > > doc = """
> > > The maximum size in bytes of the request header.
> > > Larger headers will allow for more and/or larger cookies
> > > plus larger form content encoded in a URL.
> > > However, larger headers consume more memory and can make
> > a
> > > server more vulnerable to denial of service
> > > attacks.
> > >   """
> > > )
> > >private int

Re: [DISCUSS] PIP-259: Make the config httpMaxRequestHeaderSize of the pulsar web server to configurable

2023-03-16 Thread
hi yubiao :

I have a question, why we need `httpClientRequestBufferSize ` in
proxy, can you explain in detail?

Thanks,
Bo

Yubiao Feng  于2023年3月16日周四 00:11写道:

>
> Hi community
>
> I am starting a DISCUSS for "PIP-259: Make the config
> httpMaxRequestHeaderSize of the pulsar web server configurable".
>
> ### Motivation
>
> We have two ways to manage pulsar's resources:
> - By client API (Can manage some resources, such as `create topic`, `create
> subscriber`, and so on)
> - By admin API (Can manage all the resources)
>
> The `client API` has no limit on the request length. And the `admin API`
> has a limit on the request length(such as HTTP request line and HTTP
> request headers), this restriction is done by the built-in web container
> Jetty.
>
> Almost resources can be created by two APIs, but can only be modified and
> deleted by `admin API`. This causes us to be unable to modify or delete
> resources created by `client API` with too long a name because it exceeds
> Jetty's default HTTP request URI length limit.
>
> ### Goal
>
>  1. For web servers
> Provide a way to modify Jetty's `httpMaxRequestHeaderSize` configuration
> (involves two servers: the web server in pulsar and the web server in
> pulsar-proxy)
>
>  2.For the internal client in pulsar-proxy
> Provide a way to modify Jetty-client's `httpClientRequestBufferSize`
> configuration.
>
> Since the pulsar-proxy handles HTTP requests like this: `pulsar-admin.sh`
> -> `proxy web server` -> `(highlight) internal client in proxy` -> `pulsar
> web server`.
>
> When the internal client forwards a request, it forwards the request header
> and the request body, and all the data passes through a buffer( we call it
> Buf ), like this:
> - Receive a request
> - Put the request line and request headers input to the Buf.
> - (highlight)Flush the Buf ( If the data in the request
> line and request header exceeds the length of the buf, an error is reported
> )
> - Put the request body input to the Buf.
> - Flush the Buf if it is full.
>
> So we need a config to set the `buff size` of the Buf:
> `pulsar-proxy.conf.httpClientRequestBufferSize` -> `buf size of the
> internal client`.
>
> ### API Changes
>
>  ServiceConfiguration.java
> ```java
>@FieldContext(
> category = CATEGORY_HTTP,
> doc = """
> The maximum size in bytes of the request header.
> Larger headers will allow for more and/or larger cookies
> plus larger form content encoded in a URL.
> However, larger headers consume more memory and can make a
> server more vulnerable to denial of service
> attacks.
>   """
> )
>private int httpMaxRequestHeaderSize = 8 * 1024;
> ```
>
>  ProxyConfiguration.java
>
> ```java
> @FieldContext(
> minValue = 1,
> category = CATEGORY_HTTP,
> doc = """
> The maximum size in bytes of the request header.
> Larger headers will allow for more and/or larger cookies
> plus larger form content encoded in a URL.
> However, larger headers consume more memory and can make a
> server more vulnerable to denial of service
> attacks.
>   """
> )
> private int httpMaxRequestHeaderSize = 8 * 1024;
>
> @FieldContext(
> minValue = 1,
> category = CATEGORY_HTTP,
> doc = """
>  the size of the buffer used to write requests to Broker.
>  if "httpMaxRequestHeaderSize" is large than
> "httpClientRequestBufferSize", will set
>  "httpClientRequestBufferSize" to the value of
> "httpMaxRequestHeaderSize"
>   """
> )
> private int httpClientRequestBufferSize = httpMaxRequestHeaderSize;
> ```
>
> ### Anything else?
>
> This change should cherry-pick into the previous branches ( includes
> `2.9~2.11` )
>
> If the user uses the features `RETRY Topic` or `DLQ`, it is possible that
> pulsar will automatically create some topics with names that are too long
> and cannot be managed, the [scenario has been discussed in the email](
> https://lists.apache.org/thread/q1m23ckyy10wvtzy65v8bwqwnh7r0gc8) before
>
>
> Thanks
> Yubiao Feng


Re: [DISCUSS] Retry topic should not create for a retry topic

2023-03-10 Thread
+1, I agree with you.

We should prohibit users from creating Retry Topic and DLQ Topic in a
loop. It will make uncontrollable behavior.
If we allow that, It may cause great trouble to users.

Thanks,
Bo


Asaf Mesika  于2023年2月14日周二 03:02写道:

>
> Sounds like a bug for sure.
> How did you plan on solving it?
>
>
> On Mon, Feb 13, 2023 at 12:46 AM Enrico Olivelli 
> wrote:
>
> > Il Dom 12 Feb 2023, 04:42 Yubiao Feng  > .invalid>
> > ha scritto:
> >
> > > Hi Enrico Olivelli
> > >
> > > > It is good to help users to not fall into bad situations but on the
> > other
> > > case we cannot deal with many silly configurations that you could set up,
> > > like creating a pipeline of functions that in the end create a cycle.
> > >
> > > Sorry, this test just helps to reproduce the problem quickly. The reality
> > > is that there is only one consumer, but every restart triggers this issue
> > > and ends up with a topic like this:
> > > "persistent://public/default/tp1-sub1-RETRY-sub1-RETRY-sub1-RETRY"
> > >
> > > > I wonder if we could simply document this fact instead of adding code
> > >
> > > ```java
> > > Consumer<> consumer = pulsarClient.newConsumer()
> > > .topicsPattern("my-property/my-ns/.*").subscriptionName("sub1")
> > > .enableRetry(true)
> > > ```
> > >
> > > With the client restarted, the code above will reproduce the problem.
> > >
> >
> >
> > I see the problem now.
> >
> > We must do something for this case. It must not happen. We have to fix it
> >
> > Thanks for your clarification
> >
> > Enrico
> >
> > >
> > > On Sun, Feb 12, 2023 at 3:31 AM Enrico Olivelli 
> > > wrote:
> > >
> > > > Yubiao,
> > > >
> > > > Il Sab 11 Feb 2023, 19:06 Yubiao Feng  > > > .invalid>
> > > > ha scritto:
> > > >
> > > > > Hi community
> > > > >
> > > > > I am starting a DISCUSS for "Retry topic should not create for a
> > retry
> > > > > topic."
> > > > >
> > > > > If we use regex-topic consumer and enable retry, it is possible to
> > > create
> > > > > such a topic
> > > > >
> > "persistent://public/default/tp1-sub1-RETRY-sub2-RETRY-sub3-RETRY".
> > > > You
> > > > > can reproduce this by using the test below.
> > > > >
> > > > > It probably doesn't make sense to create a RETRY/DLQ topic on
> > > RETRY/DLQ.
> > > > We
> > > > > should avoid this scenario if users use the default configuration
> > > (users
> > > > > can enable it if they need it).
> > > > >
> > > >
> > > > I agree that this is a bad case.
> > > > But should we really care?
> > > >
> > > > You must do it very intentionally.
> > > > It is good to help users to not fall into bad situations but on the
> > other
> > > > case we cannot deal with many silly configurations that you could set
> > up,
> > > > like creating a pipeline of functions that in the end create a cycle.
> > > >
> > > >
> > > > I wonder if we could simply document this fact instead of adding code
> > > >
> > > >
> > > > Enrico
> > > >
> > > >
> > > >
> > > >
> > > > > ```java
> > > > > @Test
> > > > > public void testRetryTopicWillNotCreatedForRetryTopic() throws
> > > > > Exception {
> > > > > final String topic = "persistent://my-property/my-ns/tp1";
> > > > > Producer producer =
> > > > pulsarClient.newProducer(Schema.BYTES)
> > > > > .topic(topic)
> > > > > .create();
> > > > > for (int i = 0; i < 100; i++) {
> > > > > producer.send(String.format("Hello Pulsar [%d]",
> > > > > i).getBytes());
> > > > > }
> > > > > producer.close();
> > > > >
> > > > > for (int i =0; i< 10; i++) {
> > > > > Consumer consumer =
> > > > > pulsarClient.newConsumer(Schema.BYTES)
> > > > > .topicsPattern("my-property/my-ns/.*")
> > > > > .subscriptionName("sub" + i)
> > > > > .enableRetry(true)
> > > > >
> > > > >
> > > >
> > >
> > .deadLetterPolicy(DeadLetterPolicy.builder().maxRedeliverCount(2).build())
> > > > >
> > > > > .subscriptionInitialPosition(SubscriptionInitialPosition.Earliest)
> > > > > .subscribe();
> > > > > Message message = consumer.receive();
> > > > > log.info("consumer received message : {} {}",
> > > > > message.getMessageId(), new String(message.getData()));
> > > > > consumer.reconsumeLater(message, 1, TimeUnit.SECONDS);
> > > > > consumer.close();
> > > > > }
> > > > >
> > > > > Set tps =
> > > > >
> > > > >
> > > >
> > >
> > pulsar.getBrokerService().getTopics().keys().stream().collect(Collectors.toSet());
> > > > > try {
> > > > > for (String tp : tps) {
> > > > > assertTrue(howManyKeyWordRetryInTopicName(tp,
> > > > > RETRY_GROUP_TOPIC_SUFFIX) <= 1, tp);
> > > > > assertTrue(howManyKeyWordRetryInTopicName(tp,
> > > > > DLQ_GROUP_TOPIC_SUFFIX) <= 1, tp);
> > > > > }
> > > > > } finally {
> > > > > // cleanup.
> > > > > for 

Re: [VOTE] Pulsar Client Python Release 3.1.0 Candidate 4

2023-03-10 Thread
+1 (binding)

python version: 3.7.6
- Checked the signature
- Install the python .whl file
(pulsar_client-3.1.0-cp37-cp37m-macosx_10_15_universal2.whl) on macOS
12.3.1
- Start the standalone (2.11.0)
- Start consumer (python3 ./examples/consumer.py)
- Start producer (python3 ./examples/producer.py)
- ./tests/run-unit-tests.sh without interrupted_test.py
(apachepulsar/pulsar:2.11.0)

Thanks,
Bo

Matteo Merli  于2023年3月11日周六 00:15写道:
>
> +1 (binding)
> --
> Matteo Merli
> 
>
>
> On Fri, Mar 10, 2023 at 7:44 AM Yunze Xu 
> wrote:
>
> > Hi Zike,
> >
> > It's not a regression. This error applies for Python client 3.0.0 as
> > well. Before 3.0.0, the Python client cannot be interrupted by a
> > signal. I'd rather treat it as the case that the implementation of
> > handling the UNIX signals is not so perfect. Here is the issue:
> > https://github.com/apache/pulsar-client-python/issues/103
> >
> > Thanks,
> > Yunze
> >
> > On Fri, Mar 10, 2023 at 4:25 PM Zike Yang  wrote:
> > >
> > > Hi, Yunze
> > >
> > > > However, with the latest example, the output should be the following
> > > > logs if you pressed the Ctrl+C:
> > >
> > > Thanks for your explanation. It works fine for me now.
> > >
> > > > I think it should not be a blocker, we can open an issue for that. The
> > > > official example uses a try-except block to avoid this issue.
> > >
> > > Will you highlight this in the release note? It seems it has changed
> > > the default behavior. It's better to provide the user with good
> > > practice for this.
> > >
> > > Thanks,
> > > Zike Yang
> > >
> > > On Thu, Mar 9, 2023 at 10:10 PM Yunze Xu 
> > wrote:
> > > >
> > > > The reason is caused by the `client.close()` not being called. You
> > > > should make sure `client.close()` is called when receiving a SIGINT
> > > > signal.
> > > >
> > > > ```
> > > > #8  0x7f194f6aca99 in pulsar::ConsumerImpl::shutdown() [clone
> > .cold] ()
> > > >from
> > /usr/local/lib/python3.7/site-packages/pulsar_client.libs/libpulsar-c7b16888.so
> > > > #9  0x7f194f79dd74 in pulsar::ConsumerImpl::~ConsumerImpl() ()
> > > >from
> > /usr/local/lib/python3.7/site-packages/pulsar_client.libs/libpulsar-c7b16888.so
> > > > #10 0x7f195006a71b in
> > > >
> > pybind11::class_::dealloc(pybind11::detail::value_and_holder&)
> > > > ()
> > > >from /usr/local/lib/python3.7/site-packages/_
> > pulsar.cpython-37m-x86_64-linux-gnu.so
> > > > #11 0x7f1950039f8e in pybind11::detail::clear_instance(_object*) ()
> > > >from /usr/local/lib/python3.7/site-packages/_
> > pulsar.cpython-37m-x86_64-linux-gnu.so
> > > > #12 0x7f195003ac5f in pybind11_object_dealloc ()
> > > >from /usr/local/lib/python3.7/site-packages/_
> > pulsar.cpython-37m-x86_64-linux-gnu.so
> > > > #13 0x7f1950a47927 in dict_dealloc (mp=0x7f194f302320) at
> > > > Objects/dictobject.c:1905
> > > > #14 0x7f1950acc284 in subtype_clear (self= > > > 0x7f1950273cd0>) at Objects/typeobject.c:1123
> > > > #15 0x7f1950aa793d in delete_garbage (old=0x7f1950c4a000
> > > > <_PyRuntime+448>, collectable=0x7fff65e5b2a0)
> > > > at Modules/gcmodule.c:761
> > > > #16 collect (generation=2, n_collected=0x0, n_uncollectable=0x0,
> > > > nofail=1) at Modules/gcmodule.c:913
> > > > #17 0x7f1950af92a2 in _PyGC_CollectNoFail () at
> > Modules/gcmodule.c:1602
> > > > #18 0x7f1950aec340 in PyImport_Cleanup () at Python/import.c:526
> > > > #19 0x7f1950aee5a8 in Py_FinalizeEx () at Python/pylifecycle.c:1199
> > > > #20 0x7f1950af6b48 in pymain_main (pymain=0x7fff65e5b510) at
> > > > Modules/main.c:3127
> > > > #21 0x7f1950af6a1e in _Py_UnixMain (argc=,
> > > > argv=) at Modules/main.c:3160
> > > > #22 0x7f1950768d0a in __libc_start_main (main=0x55e11cf1b050
> > > > , argc=2, argv=0x7fff65e5b668,
> > > > --Type  for more, q to quit, c to continue without paging--
> > > > t>, fini=, rtld_fini=,
> > > > stack_end=0x7fff65e5b658) at ../csu/libc-start.c:308
> > > > #23 0x55e11cf1b08a in _start ()
> > > > ```
> > > >
> > > > I think it should not be a blocker, we can open an issue for that. The
> > > > official example uses a try-except block to avoid this issue.
> > > >
> > > > Thanks,
> > > > Yunze
> > > >
> > > > On Thu, Mar 9, 2023 at 9:48 PM Yunze Xu  wrote:
> > > > >
> > > > > Hi Zike,
> > > > >
> > > > > Did you run the latest example? I still see the exceptional info:
> > > > >
> > > > > ```
> > > > > line 1243, in receive
> > > > > msg = self._consumer.receive()
> > > > > _pulsar.Interrupted: Pulsar error: ResultInterrupted
> > > > > ```
> > > > >
> > > > > However, with the latest example, the output should be the following
> > > > > logs if you pressed the Ctrl+C:
> > > > >
> > > > > ```
> > > > > ^CStop receiving messages
> > > > > ```
> > > > >
> > > > > Because the exception from `consumer.receive()`  is caught.
> > > > >
> > > > > ```python3
> > > > > try:
> > > > > msg = consumer.receive()
> > > > > # ...
> > > > > except pulsar.Interrupted:
> 

Re: [VOTE] Pulsar Release 2.10.4 Candidate 1

2023-03-08 Thread
+1 (binding)

os: mac 12.3.1, Intel
java: OpenJDK 17.0.1

- Checked the signature
- Checked LICENSE
- Start standalone
- Publish and consume messages
- Verified Function and State Function
- Verified Cassandra connector
- Build from the source package (maven 3.6.1, openJDK 11.0.12)
- Run a simple transaction performance check

Thanks,
Bo

guo jiwei  于2023年3月8日周三 15:00写道:
>
> +1 (binding)
>
> - Build from the source package
> - Checked the signature
> - Publish and consume messages
> - Verified Function and State Function
> - Verified Cassandra connector
>
> Regards
> Jiwei Guo (Tboy)
>
> On Tue, Mar 7, 2023 at 2:22 PM Xiangying Meng  wrote:
> >
> > Please ignore the previous email. This commit did not break CI.
> > Instead, a very coincidental thing happened.
> > 1. There may be problems with the maven server at that time. The three PRs
> > mentioned at that time could not download the correct jar package, and the
> > retry was invalid.
> > 2. A flaky test `recoverLongTimeAfterMultipleWriteErrors` failed multiple
> > times in a row.
> >
> > So I mistakenly thought it was caused by the last unverified commit.
> > So the RC is correct, please help verify it and vote.
> >
> > Thanks
> > Xiangying
> >
> > On Sun, Mar 5, 2023 at 9:40 PM Xiangying Meng  wrote:
> >
> > > Hi, community,
> > >
> > > Sorry to tell everyone that we may need to abort the release
> > > 2.10.4-candidate-1 because some CI can not be passed after #19674 [0] is
> > > cherry-picked.
> > > I will be sure to carry out the release process again as soon as it is
> > > resolved.
> > >
> > > Sincerely
> > > Xiangying
> > > [0] https://github.com/apache/pulsar/pull/19674
> > >
> > >
> > > On Sat, Mar 4, 2023 at 12:06 PM Xiangying Meng 
> > > wrote:
> > >
> > >> This is the third release candidate for Apache Pulsar, version 2.10.4.
> > >>
> > >> This release contains 99 commits by 34 contributors.
> > >> https://github.com/apache/pulsar/compare/v2.10.3...v2.10.4-candidate-1
> > >>
> > >> *** Please download, test, and vote on this release. This vote will stay
> > >> open
> > >> for at least 72 hours ***
> > >>
> > >> Note that we are voting upon the source (tag), binaries are provided for
> > >> convenience.
> > >>
> > >> Source and binary files:
> > >> https://dist.apache.org/repos/dist/dev/pulsar/pulsar-2.10.4-candidate-1/
> > >>
> > >> SHA-512 checksums:
> > >> 8cae74a5b586ab2378c2b2737c59507180af4b8efab4a99bc0dae233096036f5b18ab94255bea03e416d8d21958bedf684c8d4bd3982f458a547d3e1efa0f19f
> > >>  apache-pulsar-2.10.4-bin.tar.gz
> > >> 74e16c61ff6ae9e2a51e7ae24981598c71dabbff09c820bff9303c031882e1f15d029d06b6b5b6e4cc9a02b8957a102338ce09173c8744a59e5bd848b48b1d2a
> > >>  apache-pulsar-2.10.4-src.tar.gz
> > >>
> > >> Maven staging repo:
> > >> https://repository.apache.org/content/repositories/orgapachepulsar-1210/
> > >>
> > >> The tag to be voted upon:
> > >> v2.10.4-candidate-1 (d1aebd3e4c9503406845fb2e746a289e88e00fb2)
> > >> https://github.com/apache/pulsar/releases/tag/v2.10.4-candidate-1
> > >>
> > >> Pulsar's KEYS file containing PGP keys you use to sign the release:
> > >> https://downloads.apache.org/pulsar/KEYS
> > >>
> > >> Docker images:
> > >>
> > >> 
> > >>
> > >> https://hub.docker.com/layers/xiangyingmeng/pulsar/2.10.4/images/sha256-144d0380592a7e0578772eb2fa51da7cad70f1d5f8a2b46189669b15f0e6b4b6?context=repo
> > >>
> > >> 
> > >>
> > >> https://hub.docker.com/layers/xiangyingmeng/pulsar-all/2.10.4/images/sha256-bcf03c05be93ced24991afbcca13f4a4b5f183d9a7b877ae84e992e16ca599ee?context=repo
> > >>
> > >> Please download the source package, and follow the README to build
> > >> and run the Pulsar standalone service.
> > >>
> > >


Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

2023-03-08 Thread
 Hi siNan:

>From my point of view, it is just a plug-in. I don't think it is
necessary to add configuration for the plugin.
This is meaningless, and it will increase the difficulty of use for users.


SiNan Liu  于2023年3月8日周三 15:54写道:
>
> Hi, bo.
>
> 1. I understand what you say, to develop a new
> `ProtobufNativeAdvancedSchemaCompatibilityCheck`, rather than changing
> existing `ProtobufNativeSchemaCompatibilityCheck`. But I found a few small
> problems:
>
> (1)ProtobufNativeAdvancedSchemaCompatibilityCheck and
> ProtobufNativeSchemaCompatibilityCheck schemaType is PROTOBUF_NATIVE. It
> looks like both checkers are PROTOBUF not using AVRO-PROTOBUF's "native"
> implementation, which leads to some problems or "unreasonable" and gives me
> some extended thinking and questions.
>
`CompatibilityCheck ` its only a plugin.
`ProtobufNativeSchemaCompatibilityCheck` may sooner or later leave the
stage, when `ProtobufNativeAdvancedSchemaCompatibilityCheck` is
stable, we can make it the default Checker.

It is just a plug-in, users can change it at will and ensure that it
is used correctly
> (2)In broker.conf
>
> `schemaRegistryCompatibilityCheckers`. If
> ProtobufNativeSchemaCompatibilityCheck and
> ProtobufNativeAdvancedSchemaCompatibilityCheck all set. This is going to
> overwrite each other. Because this is a map:
>
> https://github.com/apache/pulsar/blob/af1360fb167c1f9484fda5771df3ea9b21d1440b/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/SchemaRegistryService.java#L36-L44
>
> ```java
>
> Map checkers = new HashMap<>();
>
> for (String className : checkerClasses) {
>
> SchemaCompatibilityCheck schemaCompatibilityCheck =
> Reflections.createInstance(className,
>
> SchemaCompatibilityCheck.class,
> Thread.currentThread().getContextClassLoader());
>
> checkers.put(schemaCompatibilityCheck.getSchemaType(),
> schemaCompatibilityCheck);
>
> ```
>
> Is this a big problem or a small one? Is it possible or unnecessary? Maybe
> we can write in the documentation that protobufNative checkers can only
> choose one of the two? Why are there two Checkers for different
> implementations of the same schemaType? Why not the checker to create
> different validator, so we don not have to change
> schemaRegistryCompatibilityCheckers.

users can only use one, not two, which will bring complexity to users

>
> (3)And after the update to ProtobufNativeAdvancedSchemaCompatibilityCheck.
> Existing topics previously only checked the name of the root message, not
> the content of protobuf.
>
> What if the user wants both Checkers?
>
> Set to ProtobufNativeAdvancedSchemaCompatibilityCheck, affect the topic of
> the existing schema?
>
> Older topics still use the old checker, and newer topics or certain older
> topics use the new advancedchecker.
>
when `ProtobufNativeAdvancedSchemaCompatibilityCheck` stable,
users will not choose `ProtobufNativeSchemaCompatibilityCheck`.
because it not a complete checker.
> (4)So should we have one schemaType for a checker? protobufNativeChecker
> can have as many different implementation classes as possible. This
> classname configuration in PIP, let's see if it can be set at the topic
> level. In the current PIP design I just load this parameter into the
> checker when the broker is started and the checkers map is set up. Can I do
> this in the new normal pr if I want to support topic level? Or perfect it
> here?
>
> Add a call PROTOBUF_NATIVE_ADVANCE schemaType corresponding
> ProtobufNativeAdvancedSchemaCompatibilityCheck? (Seems to be more trouble).
>
> Sorry I can not use the computer and network in the company, I use my
> mobile phone to reply to the email, the format may be a bit messy. Please
> understand.
>
> Thanks,
>
> sinan
>
>
> 丛搏  于 2023年3月7日周二 下午11:39写道:
>
> > SiNan Liu  于2023年3月7日周二 13:22写道:
> > >
> > > Great to see your comment, bo!
> > >
> > > 1. The first way. The protobuf website has a description of the rules,
> > but
> > > no plans to implement them.
> > > https://protobuf.dev/programming-guides/proto/#updating
> >
> > https://groups.google.com/g/protobuf
> > maybe ask here
> >
> > >
> > > 2. I think this PIP can be divided into two parts.
> > > (1) Add a flag(`ValidatorClassName`), load it into
> > > `ProtobufNativeSchemaCompatibilityCheck` when the broker starts.
> > > ValidatorClassName is empty by default, and the implementation continues
> > as
> > > before, with no change for the user.
> >
> > `ProtobufNativeSchemaCompatibilityCheck` is a plugin in `broker.conf`
> > ```
> >
> > schemaRegistryCompatibilityChec

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

2023-03-07 Thread
SiNan Liu  于2023年3月7日周二 13:22写道:
>
> Great to see your comment, bo!
>
> 1. The first way. The protobuf website has a description of the rules, but
> no plans to implement them.
> https://protobuf.dev/programming-guides/proto/#updating

https://groups.google.com/g/protobuf
maybe ask here

>
> 2. I think this PIP can be divided into two parts.
> (1) Add a flag(`ValidatorClassName`), load it into
> `ProtobufNativeSchemaCompatibilityCheck` when the broker starts.
> ValidatorClassName is empty by default, and the implementation continues as
> before, with no change for the user.

`ProtobufNativeSchemaCompatibilityCheck` is a plugin in `broker.conf`
```
schemaRegistryCompatibilityCheckers=org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck
```
I do not recommend that we directly modify this plugin and continue to
add configuration items, which will cause trouble for users.
We have a lot of configs and it's getting very unwieldy.
in my opinion, we don't change
`org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck`,
it is a simple implementation, it doesn't go wrong very often, most
users will use it. we can add another ProtobufNativeCheck named
`ProtobufNativeAdvancedSchemaCompatibilityCheck ` or other. in this
way, we don't need to add this flag. There is no need to consider
compatibility, it is just a plug-in and will not affect current logic.
If the user needs it, just change the plugin to the new implementation

> ```java
> ProtobufNativeSchemaValidator DEFAULT = (fromDescriptors, toDescriptor)
> -> {
> for (Descriptors.Descriptor fromDescriptor : fromDescriptors) {
> // The default implementation only checks if the root message
> has changed.
> if
> (!fromDescriptor.getFullName().equals(toDescriptor.getFullName())) {
> throw new ProtoBufCanReadCheckException("Protobuf root
> message isn't allow change!");
> }
> }
> };
> ```
> `ValidatorClassName` value also can be set to the current implementation of
> PIP add
> `org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl`.
>
> (2) Recoding the `ProtobufNativeSchemaCompatibilityCheck`. Through the flag
> (`ValidatorClassName`) to build different `ProtobufNativeSchemaValidator`.
> Isn't it just a plug-in? The user can develop and choose a different
> `ProtobufNativeSchemaValidator`. I think it didn't change the logic, it
> just allowed him to expand it.
>
>
> I think this PIP should be an enhancement and supplement to the function,
> and there is no such thing as unnecessary and meaningless.
>
>
> Thanks,
> sinan
>
>
>
>
>
> 丛搏  于2023年3月7日周二 11:53写道:
>
> > I think we have two ways to do that.
> >
> > First way: We need to advance the improvement of java in protobuf. Ask
> > if they have plans to improve.
> >
> > Second way: the new PROTOBUF_NATIVE `SchemaCompatibilityCheck` should
> > be implemented as a plugin, don't change any existing plugin logic
> > (it's simple and already used). I don't recommend adding flags for
> > rollback, it adds configuration and makes little sense.
> >
> > Thanks,
> > Bo
> >
> > Asaf Mesika  于2023年3月6日周一 23:00写道:
> >
> > >
> > > Can you convert the code block which is actually a quote in the
> > > beginning of the PIP to something which doesn't require to scroll
> > > horizontally so much?
> > > Use
> > >
> > https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
> > >
> > > Let's improve the clarity of what you wrote:
> > >
> > > "the PROTOBUF uses avro struct to store."
> > > -->
> > > When Schema type PROTOBUF is used, Pulsar Client assumes the object given
> > > to it as message data is an auto-generated POJO containing the
> > annotations
> > > encoding the schema. The client is using a converter, which converts a
> > > Protobuf schema descriptor into an Avro schema and sends that as the
> > Schema
> > > of the producer/consumer.
> > >
> > > "On the broker side, protobuf and avro both use SchemaData converted to
> > > org.apache.avro.Schema."
> > > -->
> > > Since the schema is an Avro schema, the implementation of compatibility
> > > check on the broker side is to simply re-use the compatibility check of
> > the
> > > AVRO 

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

2023-03-06 Thread
I think we have two ways to do that.

First way: We need to advance the improvement of java in protobuf. Ask
if they have plans to improve.

Second way: the new PROTOBUF_NATIVE `SchemaCompatibilityCheck` should
be implemented as a plugin, don't change any existing plugin logic
(it's simple and already used). I don't recommend adding flags for
rollback, it adds configuration and makes little sense.

Thanks,
Bo

Asaf Mesika  于2023年3月6日周一 23:00写道:

>
> Can you convert the code block which is actually a quote in the
> beginning of the PIP to something which doesn't require to scroll
> horizontally so much?
> Use
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
>
> Let's improve the clarity of what you wrote:
>
> "the PROTOBUF uses avro struct to store."
> -->
> When Schema type PROTOBUF is used, Pulsar Client assumes the object given
> to it as message data is an auto-generated POJO containing the annotations
> encoding the schema. The client is using a converter, which converts a
> Protobuf schema descriptor into an Avro schema and sends that as the Schema
> of the producer/consumer.
>
> "On the broker side, protobuf and avro both use SchemaData converted to
> org.apache.avro.Schema."
> -->
> Since the schema is an Avro schema, the implementation of compatibility
> check on the broker side is to simply re-use the compatibility check of the
> AVRO schema type.
>
> "ProtobufSchema is different from ProtobufNativeSchema in schema
> compatibility check it uses avro-protobuf.
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> But the current implementation of ProtobufNative schema compatibility
> check only
> checked if the root message name is changed."
>
> -->
> PROTOBUF_NATIVE schema type is different.
> The client is actually using Protobuf Descriptor as the schema, as opposed
> to Avro schema of PROTOBUF schema type. In the broker, the PROTOBUF_NATIVE
> compatibility check actually hasn't implemented any rule, besides one:
> checking if the root message name has changed.
>
>
>
> >1. For now, there is no official or third-party solution for ProtoBuf
> >compatibility. If in the future have better solutions of a third party or
> >the official, we develop new ProtobufNativeSchemaValidator and use, so
> >add a flag.
> >
> > Who do you need to make that configurable? Once you found a third party,
> just switch to it? Who knows, maybe you never will. Introduce it when you
> find it, not now.
>
>
> We improve in ProtobufNativeSchemaCompatibilityCheck BACKWARD, FORWARD
> > these strategies. As with the AVRO implementation, protobuf compatibility
> > checking need implementing the canRead method. *This will check that
> > the writtenschema can be read by readSchema.*
>
>
> I completely disagree.
> Avro implementation is confusing for our use case. Don't copy that.
>
> You have
>
> public void checkCompatible(SchemaData from, SchemaData to,
> SchemaCompatibilityStrategy strategy)
> throws IncompatibleSchemaException {
> Descriptor fromDescriptor =
> ProtobufNativeSchemaUtils.deserialize(from.getData());
> Descriptor toDescriptor =
> ProtobufNativeSchemaUtils.deserialize(to.getData());
> switch (strategy) {
> case BACKWARD_TRANSITIVE:
> case BACKWARD:
> case FORWARD_TRANSITIVE:
> case FORWARD:
> case FULL_TRANSITIVE:
> case FULL:
> checkRootMessageChange(fromDescriptor, toDescriptor, strategy);
> return;
> case ALWAYS_COMPATIBLE:
> return;
> default:
> throw new IncompatibleSchemaException("Unknown
> SchemaCompatibilityStrategy.");
> }
> }
>
> I would rename :
> from --> currentSchema
> to --> newSchema
>
> Use that switch case and have a method for each like:
> validateBackwardsCompatibility(currentSchema, newSchema)
>
> I dislike canRead and usage of writtenSchema, since you have two completely
> different use cases: from the producing side and the consumer side.
>
> schemaValidatorBuilder
> >
> > I dislike this proposal. IMO Avro implementation is way too complicated.
> Why not have a simple function for validation for each switch case above?
> Why do we need strategy and builder, and all this complexity?
>
>
> *Here are the basic compatibility rules we've defined:*
>
>
> IMO it's impossible to read the validation rules as you described them.
> I wrote how they should be structured numerous times above.
> I can't validate them.
>
>
> IMO, the current design is very hard to read.
> Please try to avoid jumping into code sections.
> Write a high level design section, in which you describe in words what you
> plan to do.
> Write the validation rules in the structure that is easy to understand:
> rules per each compatibility check, and use proper words (current schema,
> new schema), since new schema can be once used for read 

Re: [Vote] PIP-245: Make subscriptions of non-persistent topic non-durable

2023-03-06 Thread
+1 (binding)

Thanks,
Bo

guo jiwei  于2023年3月6日周一 19:10写道:
>
> +1 (binding)
>
> Regards
> Jiwei Guo (Tboy)
>
> On Mon, Mar 6, 2023 at 9:59 AM Yunze Xu  wrote:
> >
> > +1 (binding)
> >
> > Thanks,
> > Yunze
> >
> > On Fri, Mar 3, 2023 at 11:46 AM PengHui Li  wrote:
> > >
> > > +1 (binding)
> > >
> > > Penghui
> > >
> > > > On Feb 13, 2023, at 14:56, Jiuming Tao  
> > > > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > I would like to start a VOTE on `PIP-245: Make subscriptions of 
> > > > non-persistent topic non-durable`.
> > > >
> > > > Motivation:
> > > >
> > > > There are two types of subscriptions for a topic: Durable and 
> > > > Non-durable.
> > > >
> > > > We create a Consumer with a Durable subscription and a Reader with a 
> > > > Non-durable subscription.
> > > >
> > > > But for NonPersistentTopic, creating a Durable subscription is 
> > > > meaningless, NonPersistentSubscription doesn't have a ManagedCursor to 
> > > > persistent its data. After its consumer disconnected, the subscription 
> > > > couldn't be removed automatically if we didn't set the value of 
> > > > subscriptionExpirationTimeMinutes greater than 0.
> > > >
> > > > For subscriptionExpirationTimeMinutes, it controls the subscription 
> > > > expiration of NonPersistentTopic and PersistentTopic, if we set the 
> > > > value of subscriptionExpirationTimeMinutes greater than 0, it may lead 
> > > > to data loss(The durable subscriptions of PersistentTopic also can be 
> > > > removed).
> > > >
> > > > And the Non-durable subscriptions will be removed automatically after 
> > > > all the consumers disconnected, it's the existing logic.
> > > >
> > > > For the purpose of removing the subscriptions which have no active 
> > > > consumers of NonPersistentTopic and the above reasons, we can make all 
> > > > the subscriptions of a NonPersistentTopic Non-durable.
> > > >
> > > >
> > > >
> > > > For more details, you can read: 
> > > > https://github.com/apache/pulsar/issues/19448 
> > > > 
> > > >
> > > > And the discuss thread is available at: 
> > > > https://lists.apache.org/thread/2ltmyglnb25jy8nk58twkwbglws43bst 
> > > > 
> > > >
> > > > Thanks,
> > > > Tao Jiuming
> > >


Re: [VOTE][PIP-242] Topic name restriction

2023-02-21 Thread
+1 binding

avinash kala  于2023年2月21日周二 15:21写道:
>
> +1
>
> On Tue, Feb 21, 2023, 12:44 PM Haiting Jiang  wrote:
>
> > +1 binding
> >
> > Haiting
> >
> > On Tue, Feb 21, 2023 at 3:07 PM guo jiwei  wrote:
> > >
> > > +1 (binding)
> > >
> > >
> > > Regards
> > > Jiwei Guo (Tboy)
> > >
> > > On Mon, Feb 20, 2023 at 6:06 PM Zike Yang  wrote:
> > > >
> > > > +1 (non-binding)
> > > >
> > > > Thanks,
> > > > Zike Yang
> > > >
> > > >
> > > > On Mon, Feb 20, 2023 at 1:53 PM PengHui Li  wrote:
> > > > >
> > > > > +1(binding)
> > > > >
> > > > > Thanks,
> > > > > Penghui
> > > > >
> > > > > On Mon, Feb 20, 2023 at 11:54 AM Cong Zhao 
> > wrote:
> > > > >
> > > > > > +1 (non-binding)
> > > > > >
> > > > > > Thanks,
> > > > > > Cong
> > > > > >
> > > > > > On 2023/02/18 08:58:26 mattisonc...@gmail.com wrote:
> > > > > > > Hi, All
> > > > > > >
> > > > > > > After a fascinating discussion, I would start the vote of
> > PIP-242.
> > > > > > >
> > > > > > > We have chosen to drop out the `system topic` related
> > improvement to
> > > > > > another PIP. Therefore, the current version is simple enough and
> > it has a
> > > > > > clear boundary.
> > > > > > >
> > > > > > > Please leave +1/-1 in this thread to join the vote. and feel
> > free to
> > > > > > leave any concerns.
> > > > > > >
> > > > > > > Thanks to you guys.
> > > > > > >
> > > > > > > Best,
> > > > > > > Mattison
> > > > > > >
> > > > > > > References:
> > > > > > >
> > > > > > > • PIP https://github.com/apache/pulsar/issues/19239
> > > > > > > • Discussion
> > > > > > https://lists.apache.org/thread/oz79m0f2nw059jctq4cmms74yq5n2l1m
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> >


Re: [VOTE] Pulsar Client C++ Release 3.1.2 Candidate 1

2023-02-14 Thread
23008]
> > MultiTopicsConsumerImpl:274 | Successfully Subscribed to a single
> > partition of topic in TopicsConsumer. Partitions need to create : 0
> > 2023-02-13 09:28:31.195 INFO  [139733718923008]
> > MultiTopicsConsumerImpl:137 | Successfully Subscribed to Topics
> > 2023-02-13 09:29:29.226 INFO  [139733722046144] ConsumerImpl:1097 |
> > [persistent://public/default/partition-testMessageListenerPause-partition-0,
> > subscription-name, 1] Closing consumer for topic
> > persistent://public/default/partition-testMessageListenerPause-partition-0
> > 2023-02-13 09:29:29.229 INFO  [139733718923008] ConsumerImpl:1083 |
> > [persistent://public/default/partition-testMessageListenerPause-partition-0,
> > subscription-name, 1] Closed consumer 1
> > 2023-02-13 09:29:29.230 INFO  [139733722046144] ProducerImpl:740 |
> > [persistent://public/default/partition-testMessageListenerPause-partition-0,
> > standalone-0-0] Closing producer for topic
> > persistent://public/default/partition-testMessageListenerPause-partition-0
> > 2023-02-13 09:29:29.233 INFO  [139733718923008] ProducerImpl:704 |
> > [persistent://public/default/partition-testMessageListenerPause-partition-0,
> > standalone-0-0] Closed producer 0
> > 2023-02-13 09:29:29.233 INFO  [139733722046144] ClientImpl:516 |
> > Closing Pulsar client with 0 producers and 0 consumers
> > 2023-02-13 09:29:29.234 INFO  [139733685270272] ClientConnection:1600
> > | [127.0.0.1:37066 -> 127.0.0.1:6650] Connection closed with
> > ConnectError
> > 2023-02-13 09:29:29.234 INFO  [139733685270272] ClientConnection:269 |
> > [127.0.0.1:37066 -> 127.0.0.1:6650] Destroyed connection
> > 2023-02-13 09:29:29.234 INFO  [139733722046144] ProducerImpl:694 |
> > Producer - 
> > [persistent://public/default/partition-testMessageListenerPause-partition-0,
> > standalone-0-0] , [batchMessageContainer = { BatchMessageContainer
> > [size = 0] [bytes = 0] [maxSize = 1000] [maxBytes = 131072] [topicName
> > = 
> > persistent://public/default/partition-testMessageListenerPause-partition-0]
> > [numberOfBatchesSent_ = 10001] [averageBatchSize_ = 0.] }]
> > [   OK ] BasicEndToEndTest.testMessageListenerPause (58470 ms)
> > ```
> >
> > You can also verify the master branch in your local env. If it still
> > failed, it would be something wrong with your own env. Please open an
> > issue for it.
> >
> > Thanks,
> > Yunze
> >
> > On Fri, Feb 10, 2023 at 2:59 PM 丛搏  wrote:
> > >
> > > I run ./pulsar-tests block,
> > > ```
> > > [ RUN  ] BasicEndToEndTest.testMessageListenerPause
> > > 2023-02-10 14:53:57.415 INFO  [0x101084580] BasicEndToEndTest:922 | res = 
> > > 204
> > > 2023-02-10 14:53:57.415 INFO  [0x101084580] ClientConnection:190 |
> > > [ -> pulsar://localhost:6650] Create ClientConnection,
> > > timeout=1
> > > 2023-02-10 14:53:57.415 INFO  [0x101084580] ConnectionPool:97 |
> > > Created connection for pulsar://localhost:6650
> > > 2023-02-10 14:53:57.418 INFO  [0x16ffaf000] ClientConnection:388 |
> > > [[::1]:51199 -> [::1]:6650] Connected to broker
> > > 2023-02-10 14:53:57.433 INFO  [0x16ffaf000] HandlerBase:72 |
> > > [persistent://public/default/partition-testMessageListenerPause, ]
> > > Getting connection from pool
> > > 2023-02-10 14:53:57.502 INFO  [0x16ffaf000] ProducerImpl:202 |
> > > [persistent://public/default/partition-testMessageListenerPause, ]
> > > Created producer on broker [[::1]:51199 -> [::1]:6650]
> > > 2023-02-10 14:53:57.502 INFO  [0x101084580] Client:87 | Subscribing on
> > > Topic :partition-testMessageListenerPause
> > > 2023-02-10 14:53:57.506 INFO  [0x16ffaf000] HandlerBase:72 |
> > > [persistent://public/default/partition-testMessageListenerPause,
> > > subscription-name, 0] Getting connection from pool
> > > 2023-02-10 14:53:57.549 INFO  [0x16ffaf000] ConsumerImpl:238 |
> > > [persistent://public/default/partition-testMessageListenerPause,
> > > subscription-name, 0] Created consumer on broker [[::1]:51199 ->
> > > [::1]:6650]
> > > 2023-02-10 14:53:57.549 INFO  [0x101084580] ConsumerImpl:289 |
> > > [persistent://public/default/partition-testMessageListenerPause,
> > > subscription-name, 0] Unsubscribing
> > > 2023-02-10 14:53:57.574 INFO  [0x16ffaf000] ConsumerImpl:294 |
> > > [persistent://public/default/partition-testMessageListenerPause,
> > > subscription-name, 0] Unsubscribed successfully
> > > 2023-02-10 14:53:57.574 INFO  [0x101084580] Client:87

Re: [VOTE][PIP-240] A new API to unload subscriptions

2023-02-12 Thread
+1 (binding)

Thanks,
Bo

易客 萧  于2023年2月13日周一 09:51写道:
>
> +1 (non-binding)
> 
> From: Yubiao Feng 
> Sent: Sunday, February 12, 2023 20:04
> To: dev@pulsar.apache.org 
> Subject: [VOTE][PIP-240] A new API to unload subscriptions
>
> Hi Pulsar Community
>
> I would like to start a VOTE on "A new API to unload subscriptions"
>
> The proposal can be read at 
> *https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fpulsar%2Fissues%2F19187=05%7C01%7C%7Ca3616706ed51414081be08db0cf169b3%7C84df9e7fe9f640afb435%7C1%7C0%7C63811800349767%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=yY8VMC0%2BqCNr5JgmRDT6oLuz%2B8FiCf1vkMvIzpFimu8%3D=0
> *
>
> and the discussion thread is available at
> https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread%2F55n75h9lyfscnc04wooz9p676cg44w79=05%7C01%7C%7Ca3616706ed51414081be08db0cf169b3%7C84df9e7fe9f640afb435%7C1%7C0%7C638118003222405993%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=Xb5KQdHDG05qcb%2BnbV%2FZA3XjQ03RqJIUoR7d9Sl28KE%3D=0
>
> Voting will stay open for at least 48h.
>
> Thanks
> Yubiao Feng


Re: [VOTE] Pulsar Client C++ Release 3.1.2 Candidate 1

2023-02-09 Thread
I run ./pulsar-tests block,
```
[ RUN  ] BasicEndToEndTest.testMessageListenerPause
2023-02-10 14:53:57.415 INFO  [0x101084580] BasicEndToEndTest:922 | res = 204
2023-02-10 14:53:57.415 INFO  [0x101084580] ClientConnection:190 |
[ -> pulsar://localhost:6650] Create ClientConnection,
timeout=1
2023-02-10 14:53:57.415 INFO  [0x101084580] ConnectionPool:97 |
Created connection for pulsar://localhost:6650
2023-02-10 14:53:57.418 INFO  [0x16ffaf000] ClientConnection:388 |
[[::1]:51199 -> [::1]:6650] Connected to broker
2023-02-10 14:53:57.433 INFO  [0x16ffaf000] HandlerBase:72 |
[persistent://public/default/partition-testMessageListenerPause, ]
Getting connection from pool
2023-02-10 14:53:57.502 INFO  [0x16ffaf000] ProducerImpl:202 |
[persistent://public/default/partition-testMessageListenerPause, ]
Created producer on broker [[::1]:51199 -> [::1]:6650]
2023-02-10 14:53:57.502 INFO  [0x101084580] Client:87 | Subscribing on
Topic :partition-testMessageListenerPause
2023-02-10 14:53:57.506 INFO  [0x16ffaf000] HandlerBase:72 |
[persistent://public/default/partition-testMessageListenerPause,
subscription-name, 0] Getting connection from pool
2023-02-10 14:53:57.549 INFO  [0x16ffaf000] ConsumerImpl:238 |
[persistent://public/default/partition-testMessageListenerPause,
subscription-name, 0] Created consumer on broker [[::1]:51199 ->
[::1]:6650]
2023-02-10 14:53:57.549 INFO  [0x101084580] ConsumerImpl:289 |
[persistent://public/default/partition-testMessageListenerPause,
subscription-name, 0] Unsubscribing
2023-02-10 14:53:57.574 INFO  [0x16ffaf000] ConsumerImpl:294 |
[persistent://public/default/partition-testMessageListenerPause,
subscription-name, 0] Unsubscribed successfully
2023-02-10 14:53:57.574 INFO  [0x101084580] Client:87 | Subscribing on
Topic :partition-testMessageListenerPause
2023-02-10 14:53:57.580 INFO  [0x16ffaf000] HandlerBase:72 |
[persistent://public/default/partition-testMessageListenerPause,
subscription-name, 1] Getting connection from pool
2023-02-10 14:53:57.628 INFO  [0x16ffaf000] ConsumerImpl:238 |
[persistent://public/default/partition-testMessageListenerPause,
subscription-name, 1] Created consumer on broker [[::1]:51199 ->
[::1]:6650]
```

I'm not sure what the problem is

Thanks,
Bo

Baodi Shi  于2023年2月9日周四 18:28写道:
>
>  +1 (non-binding)
>
> * Verify checksum and signatures
> * Build from source (MacOS arm64 13.1)
> * Verify oauth2 set `TlsTrustCertsFilePath`(libpulsarwithdeps.a)
> * Verify the oauth2 set `TlsTrustCertsFilePath` on Node.js client
>
>
> Thanks,
> Baodi Shi
>
>
> 在 2023年2月9日 04:33:36 上,Matteo Merli  写道:
>
> > +1
> >
> > --
> > Matteo Merli
> > 
> >
> > On Wed, Feb 8, 2023 at 12:28 AM Yunze Xu 
> > wrote:
> >
> >
> > This is the first release candidate for Apache Pulsar Client C++, version
> > 3.1.2.
> >
> >
> > It fixes the following issues:
> >
> >
> > https://github.com/apache/pulsar-client-cpp/issues?q=label%3Arelease%2F3.1.2
> >
> >
> > *** Please download, test and vote on this release. This vote will stay
> > open
> >
> > for at least 72 hours ***
> >
> >
> > Note that we are voting upon the source (tag), binaries are provided for
> >
> > convenience.
> >
> >
> > Source and binary files:
> >
> >
> > https://dist.apache.org/repos/dist/dev/pulsar/pulsar-client-cpp/pulsar-client-cpp-3.1.2-candidate-1/
> >
> >
> > SHA-512 checksums:
> >
> >
> > b8ecada9f9287b3c135c5b939385a59053193cab6c0b41d6faa074e98f596ad5c86d903d4bc581512fe10c1cd35ee74a2adcfb699f2e34349be2ee1d75f81cf4
> >
> >  apache-pulsar-client-cpp-3.1.2.tar.gz
> >
> >
> > The tag to be voted upon:
> >
> > v3.1.2-candidate-1 (9257d147d2cdfdd3817f9f1194d11797ed3d485c)
> >
> > https://github.com/apache/pulsar-client-cpp/releases/tag/v3.1.2-candidate-1
> >
> >
> > Pulsar's KEYS file containing PGP keys you use to sign the release:
> >
> > https://dist.apache.org/repos/dist/dev/pulsar/KEYS
> >
> >
> > Please download the source package, and follow the README to compile and
> > test.
> >
> >


Re: [VOTE] Pulsar Client Python Release 3.1.0 Candidate 2

2023-02-02 Thread
+1(binding)
os: mac 12.6
python: 3.9.6

python3 ./examples/consumer.py
python3 ./examples/producer.py

Thanks,
Bo

Matteo Merli  于2023年2月3日周五 03:26写道:
>
> +1
> --
> Matteo Merli
> 
>
> On Thu, Feb 2, 2023 at 5:57 AM Yunze Xu  wrote:
> >
> > This is the 2nd release candidate for Apache Pulsar Client Python,
> > version 3.1.0.
> >
> > It fixes the following issues:
> > https://github.com/apache/pulsar-client-python/milestone/2?closed=1
> >
> > *** Please download, test and vote on this release. This vote will
> > stay open for at least 72 hours ***
> >
> > Python wheels:
> > https://dist.apache.org/repos/dist/dev/pulsar/pulsar-client-python-3.1.0-candidate-2/
> >
> > The supported python versions are 3.7, 3.8, 3.9, 3.10 and 3.11. The
> > supported platforms and architectures are:
> > - Windows x86_64 (windows/)
> > - glibc-based Linux x86_64 (linux-glibc-x86_64/)
> > - glibc-based Linux arm64 (linux-glibc-arm64/)
> > - musl-based Linux x86_64 (linux-musl-x86_64/)
> > - musl-based Linux arm64 (linux-musl-arm64/)
> > - macOS universal 2 (macos/)
> >
> > The tag to be voted upon: v3.1.0-candidate-2
> > (fda50867a9c7bf927309527fade2f53eb3907bed)
> > https://github.com/apache/pulsar-client-python/releases/tag/v3.1.0-candidate-2
> >
> > Pulsar's KEYS file containing PGP keys you use to sign the release:
> > https://dist.apache.org/repos/dist/dev/pulsar/KEYS
> >
> > Please download the Python wheels and follow the README to test.


Re: [Vote] PIP-242: Introduce enableStrictTopicName to reject creating topic with -partition- keyword.

2023-01-31 Thread
+1 (binding)

Thanks,
Bo

ZhangJian He  于2023年1月31日周二 16:52写道:
>
> +1 (non-binding)
>
> Thanks
> ZhangJian He
>
>
> On Tue, 31 Jan 2023 at 16:51, PengHui Li  wrote:
>
> > +1 (binding)
> >
> > Penghui
> >
> > On Tue, Jan 31, 2023 at 3:52 PM Nitin Goyal 
> > wrote:
> >
> > > +1 (non-binding)
> > >
> > > On Tue, Jan 31, 2023 at 12:29 PM guo jiwei  wrote:
> > >
> > > > +1 (binding)
> > > >
> > > >
> > > > Regards
> > > > Jiwei Guo (Tboy)
> > > >
> > > > On Tue, Jan 31, 2023 at 2:36 PM Yunze Xu  > >
> > > > wrote:
> > > > >
> > > > > +1 (binding)
> > > > >
> > > > > Thanks,
> > > > > Yunze
> > > > >
> > > > > On Tue, Jan 31, 2023 at 6:57 AM  wrote:
> > > > > >
> > > > > > Hello everyone.
> > > > > >
> > > > > > I would like to start the vote for PIP-242
> > > > https://github.com/apache/pulsar/issues/19239,
> > > > > > Please let me know if you have any concerns or questions.
> > > > > >
> > > > > > Best,
> > > > > > Mattison
> > > > > >
> > > > > > --- Paste original PIP content to help quote --
> > > > > >
> > > > > > ### Motivation
> > > > > >
> > > > > > Currently, the Apache Pulsar broker allows users to create a topic
> > > > name that includes `-partition-`, which is confusing for our developers
> > > to
> > > > identify whether this is a partition of a partitioned topic. Plus, we
> > > need
> > > > to add more logic to be compatible with this special topic name. for
> > > > example:
> > > > > >
> > > > > > - https://github.com/apache/pulsar/pull/19240
> > > > > > - https://github.com/apache/pulsar/pull/19230
> > > > > > - https://github.com/apache/pulsar/pull/19171
> > > > > > - https://github.com/apache/pulsar/pull/19086
> > > > > > - ...
> > > > > >
> > > > > > ### Goal
> > > > > > This proposal wants `-partition-` to be a topic name keyword. Users
> > > > can only create a topic with it if the topic is partitioned. For the
> > > > compatibility reason, we want to Introduce a new configuration -
> > > > `enableStrictTopicName` for the broker to help reject creating a topic
> > in
> > > > the following cases:
> > > > > > 1. Create a partitioned topic that includes `-partition-`.
> > > > > > 2. Create a topic which is not a partitioned topic.
> > > > > >
> > > > > > **Create a topic:**
> > > > > > _no corresponding partitioned topic_
> > > > > >
> > > > > > - persistent://public/default/local-name (passed)
> > > > > > - persistent://public/default/local-name-partition-z (rejected by
> > > > keyword)
> > > > > > - persistent://public/default/local-name-partition-0 (rejected by
> > > > keyword)
> > > > > >
> > > > > > _Has corresponding partitioned topic, **partitions=2** and topic
> > > > partition name is **persistent://public/default/local-name**_
> > > > > >
> > > > > > - persistent://public/default/local-name-partition-0 (passed,
> > Because
> > > > it is the partition topic's sub-partition)
> > > > > > - persistent://public/default/local-name-partition-z (rejected by
> > > > keyword)
> > > > > > - persistent://public/default/local-name-partition-4 (rejected,
> > > > Because it exceeds the number of maximum partitions)
> > > > > >
> > > > > > **Create a partitioned topic(topic metadata)**
> > > > > >
> > > > > > - persistent://public/default/local-name (passed)
> > > > > > - persistent://public/default/local-name-partition-z (rejected by
> > > > keyword)
> > > > > > - persistent://public/default/local-name-partition-0 (rejected by
> > > > keyword)
> > > > > >
> > > > > >
> > > > > > ### API Changes
> > > > > >
> > > > > > Add a new configuration, `enableStrictTopicName=false`.
> > > > > >
> > > > > > ### Implementation
> > > > > >
> > > > > > 1. Add configuration `enableStrictTopicName=false`.
> > > > > > 2. Add rejection logic when the user enables
> > `enableStrictTopicName`.
> > > > > > 4. Add warning logs to inform users that we do not recommend
> > creating
> > > > non-partitioned topics with the keyword `-partition-`.
> > > > > > 5. Make `enableStrictTopicName=true` in the next major release.
> > > >
> > >
> > >
> > > --
> > > Regards
> > > Nitin Goyal
> > >
> >


Re: [DISCUSS] PIP-240 A new API to unload subscriptions

2023-01-12 Thread
> I would invest more time in:
> - monitoring tools (tools to detect quickly stuck consumers)
> - circuit breakers (fast fail/shut the door to consumers/producers
> that don't behave correctly)
> - guard rails (limits to clients to prevent them to exhaust the
resources on the brokers)

I agree with this point of view, we should not increase the complexity
of the code, add non-essential APIs to extend unload, unload is not
essentially to solve consumer stuck or producer block problem.

Thanks,
Bo

Enrico Olivelli  于2023年1月12日周四 16:12写道:
>
> Yubiao,
> thanks for sharing your problem and a proposal, this is very helpful
> for the community to get in touch with the pain of Pulsar
> users/administrators.
>
> In my experience if a "subscription is stuck", the problems are:
> * the client has some problems (bug in the client/misconfiguration
> somewhere) - 99.9%
> * there is a bug in Pulsar - 0.1%
>
> Unloading a topic is an operation that triggers some reset of the
> state on both the broker and the clients and this usually TEMPORARY
> unblocks the subscription.
>
> I have never seen a problem that is temporarily solved by topic
> unload/broker restart to be permanently solved with that operation.
> If there is a problem we should spend time on investigating the
> problem and not in adding this kind of tool.
>
> I believe that we should not continue to add these kinds of hacks into Pulsar:
> - easy reset...
> - ignore errors... (catch Throwable...)
>
> The overall result is a system that "seems to work" but it actually
> doesn't work properly
>
> I would invest more time in:
> - monitoring tools (tools to detect quickly stuck consumers)
> - circuit breakers (fast fail/shut the door to consumers/producers
> that don't behave correctly)
> - guard rails (limits to clients to prevent them to exhaust the
> resources on the brokers)
>
>
>
> Enrico
>
> Il giorno gio 12 gen 2023 alle ore 08:22  ha scritto:
> >
> > Hi, Yubiao
> >
> > I agree with this idea because some users care about the production rate. 
> > They don't want to unload the whole topic to fix the subscription problem.
> >
> > I've got some questions:
> >
> > 1. How do you handle the race condition when you are trying to unload the 
> > subscription, and the new consumer wants to subscribe to this subscription 
> > at the same time? I'm unsure if it has the race condition. I just want to 
> > remind you about that. :)
> > 2. Would you like to add some restful API design to clarify the 
> > implementation?
> > a. Request method
> > b. Request path
> > c. Response code
> > d. etc.
> >
> >
> > Thanks for your work.
> > Mattison
> > On Jan 11, 2023, 17:01 +0800, Yubiao Feng 
> > , wrote:
> > > Hi community
> > >
> > > I am starting a DISCUSS for PIP-240: A new API to unload subscriptions.
> > >
> > > PIP issue: https://github.com/apache/pulsar/issues/19187
> > >
> > > ### Motivation
> > >
> > > We sometimes try to unload the topic to resolve some consumption-stop
> > > issues. But the unloading topic will also impact the producer side.
> > >
> > > ### Goal
> > >
> > > Providing a new API to unload the subscription dimension triggers
> > > reconnection of all consumers on that subscription and reconnection is
> > > guaranteed by the client. The API will be used in these ways:
> > > - unload special subscription of one topic(or partitioned topic)
> > > - unload all subscriptions of one topic(or partitioned topic)
> > > - unload subscriptions of one topic(or partitioned topic) by regular
> > > expression
> > > - If a reader's subscription name is not set, a random subscription name
> > > prefixed with 'multiTopicsReader-' or 'reader-' will be used, and users 
> > > can
> > > uninstall these subscriptions using regular expressions.
> > >
> > > In addition to triggering consumer disconnection, Unloading Subscribers
> > > will restart the Dispatcher, which resets the redeliver message queue and
> > > delayed message queue in the Broker's memory, which can help resolve 
> > > issues
> > > caused by an abnormal dispatcher state. However, the execution flow of
> > > Unloading Subscribers does not include a restart of the Managed Cursor
> > > related to this dispatcher; if there is a problem with the cursor, we can
> > > only rely on the unload topic to solve it.
> > >
> > > Note: From the client's perspective, this connection may be shared by
> > > consumers, producers, and transactions, so Unloading Subscribers maybe
> > > impact the producer and transaction.
> > >
> > >  These scenarios are not supported
> > > - Functions `message-dedup`, `geo-replication,` and `shadow-topic` also
> > > read messages from the topic, but Unloading subscribers will not support
> > > triggering restarts of these three functions( because the cursor is used
> > > directly to read the data in these scenarios, not the consumer or reader 
> > > ).
> > > - The Compression task(subscription name is `__compaction`) also use a
> > > reader to read data, but Unloading 

Re: [DISCUSS] Registering Jackson Java 8 support modules by default for all Pulsar components, including client

2023-01-09 Thread
Hi, Lari:

Will it affect compatibility? If it is just an improved function, I
think it can also be added to the pulsar-common module. it adds the
dependency, so it needs PIP to discuss.

Thanks,
Bo

Lari Hotari  于2023年1月9日周一 19:06写道:
>
> Hi all,
>
> Jackson has a separate Java 8 support modules for adding support for proper 
> serialization and deserialization of new classes that were added in Java 8 
> (Java 8 was released in 2014).
>
> These Jackson Java 8 support modules haven't been used in the Pulsar code 
> base. This is a pity. This causes a lot of pain when using Java Time classes 
> in Pulsar applications or Pulsar Functions. There are ways to get the classes 
> working for applications, but the documentation is missing. It would make 
> things easier if the Java 8 support modules for Jackson would be included and 
> registered by default.
>
> I have created a PR to register Jackson Java 8 support modules by default for 
> all Pulsar components. The PR is https://github.com/apache/pulsar/pull/19161 .
>
> Please review and provide feedback. Do we need a PIP for this change?
>
> -Lari


Re: [DISCUSS] PIP-236: Upload AUTO_CONSUME SchemaType to Broker

2023-01-05 Thread
Hi Yunze,

> It's a good idea to use `ProtocolVersion` to control. But adding a
> negative schema type still looks weird.

negative schema type is a history problem(include `NONE` schema type).
I don't think it is a good implementation, It adds too much
complexity. the broker can control any schema-type behavior. like this
problem, don't create sub command with `AUTO_CONSUME` and bring this
problem. We have written a lot of complicated code to solve this
historical problem, If you add new fields it will be more unacceptable
than negative schema type. We should choose a simpler and more direct
way to deal with this matter instead of making it more complicated

Thanks,
Bo

Yunze Xu  于2023年1月6日周五 13:18写道:
>
> Hi Bo,
>
> >  the old server compatibility can add `ProtocolVersion` to control.
>
> It's a good idea to use `ProtocolVersion` to control. But adding a
> negative schema type still looks weird. You can find the following
> description in SchemaType.java, which was added in
> https://github.com/apache/pulsar/pull/3940:
>
> ```java
> // Schemas that don't have schema info. the value should be negative.
> ```
>
> If you expose the negative schema type in PulsarApi.proto, how could
> you explain to users that the "new" schema type is a negative integer?
> And for developers, the negative schema types should not have the
> schema info, but you create a schema info for it.
>
> Thanks,
> Yunze
>
> On Fri, Jan 6, 2023 at 1:07 PM 丛搏  wrote:
> >
> > > Instead, we can add an optional field into CommandSubscribe to
> > > indicate the schema compatibility check is skipped.
> > > ```protobuf
> > > optional bool check_schema_compatibility = 20 [default = true]
> > > ```
> > `check_ schema_ Compatibility 'contains too many meanings. I think
> > this change will make the code more uncontrollable.
> > I still suggest uploading the `AUTO_CONSUME` type directly. the old
> > server compatibility can add `ProtocolVersion` to control. Adding any
> > other fields in proto or uploading directly ` AUTO_ CONSUME ` type
> > makes no difference. Other modifications may lead to ambiguity.
> >
> > Thanks,
> > Bo
> >
> > SiNan Liu  于2023年1月6日周五 00:17写道:
> > >
> > > I just updated the PIP issue and title, you guys can have a look. 
> > > issue19113
> > > <https://github.com/apache/pulsar/issues/19113>
> > > I added `check_schema_compatibility` in CommandSubscribe, and I also made
> > > many other changes.
> > >
> > > Yunze Xu  于2023年1月5日周四 14:33写道:
> > >
> > > > It's not related to the schema itself. When an AUTO_CONSUME consumer
> > > > subscribes to a topic, the option tells the broker that it's an
> > > > AUTO_CONSUME consumer so that the broker should not treat it as an
> > > > active consumer when performing schema compatibility check. If there
> > > > is a consumer that also wants to ignore the schema compatibility check
> > > > in future, this option can be reused.
> > > >
> > > > The other important reason is the breaking change by carrying the
> > > > schema info on an AUTO_CONSUMER consumer. (See my explanations in
> > > > GitHub and the mail list) If the consumer serves an old version
> > > > consumer, the schema could be uploaded into the registry and other
> > > > clients would be affected. So we should keep not carrying the schema
> > > > info in CommandSubscribe for an AUTO_CONSUMER consumer.
> > > >
> > > > Thanks,
> > > > Yunze
> > > >
> > > > On Thu, Jan 5, 2023 at 11:55 AM SiNan Liu  
> > > > wrote:
> > > > >
> > > > > I have modified pip issue and title last night. Yunze. You mean that 
> > > > > in
> > > > > PulsarApi.proto, take `optional bool is_auto_consume_schema = 6 
> > > > > [default
> > > > =
> > > > > false]; ` in CommandSubscribe instead of Schema? But shouldn't
> > > > > schema-related stuff be in Schema?
> > > > >
> > > > > Thanks,
> > > > > Sinan
> > > > >
> > > > > Yunze Xu  于 2023年1月5日周四 上午12:31写道:
> > > > >
> > > > > > I found a similar compatibility problem with my closed PR. We should
> > > > > > not set the `Schema` field for AUTO_CONSUME schema. More 
> > > > > > explanations
> > > > > > can be found here [1].
> > > > > >
> > > > > > Instead, we can add an optional field 

Re: [DISCUSS] PIP-236: Upload AUTO_CONSUME SchemaType to Broker

2023-01-05 Thread
 client.newProducer(Schema.AVRO(User.class))
> > > > > .topic(topic)
> > > > > .create();
> > > > > ```
> > > > >
> > > > > - If the schema type is 0 in CommandSubscribe, the NONE schema will
> > be
> > > > > persisted and the producer will fail to create due to the schema
> > > > > compatibility check.
> > > > > - If the schema type is -3 (AUTO_CONSUME), it will fail at
> > subscribe()
> > > > > with the following error:
> > > > >
> > > > > ```
> > > > > 23:49:10.978 [pulsar-io-18-13] WARN
> > > > > org.apache.pulsar.broker.service.ServerCnx - [/172.23.160.1:5921]
> > Got
> > > > > exception java.lang.IllegalStateException: Some required fields are
> > > > > missing
> > > > > at
> > > >
> > org.apache.pulsar.common.api.proto.Schema.checkRequiredFields(Schema.java:337)
> > > > > at
> > > > org.apache.pulsar.common.api.proto.Schema.parseFrom(Schema.java:332)
> > > > > at
> > > >
> > org.apache.pulsar.common.api.proto.CommandSubscribe.parseFrom(CommandSubscribe.java:785)
> > > > > at
> > > >
> > org.apache.pulsar.common.api.proto.BaseCommand.parseFrom(BaseCommand.java:2397)
> > > > > ```
> > > > >
> > > > > Thanks,
> > > > > Yunze
> > > > >
> > > > >
> > > > > On Wed, Jan 4, 2023 at 10:34 PM SiNan Liu 
> > > > wrote:
> > > > > >
> > > > > > I just implemented add an optional field in the subscribe request
> > and
> > > > > > compatibility seems to be fine. You guys can have a look at my PR (
> > > > > > https://github.com/apache/pulsar/pull/17449).
> > > > > >
> > > > > > Yunze Xu  于2023年1月4日周三 21:31写道:
> > > > > >
> > > > > > > > Why can't we upload negative schema types?
> > > > > > >
> > > > > > > I want to avoid the changes to existing methods like
> > > > > > > Commands#getSchemaType, which converts all negative schema types
> > to
> > > > > > > NONE:
> > > > > > >
> > > > > > > ```java
> > > > > > > private static Schema.Type getSchemaType(SchemaType type) {
> > > > > > > if (type.getValue() < 0) {
> > > > > > > return Schema.Type.None;
> > > > > > > } else {
> > > > > > > return Schema.Type.valueOf(type.getValue());
> > > > > > > }
> > > > > > > }
> > > > > > > ```
> > > > > > >
> > > > > > > I guess the above code was written because:
> > > > > > > 1. NONE schema type means it's not uploaded into the registry.
> > (See
> > > > #3940
> > > > > > > [1])
> > > > > > > 2. There is no existing schema that uses NONE as its schema type,
> > > > i.e.
> > > > > > > NONE schema is used as something special.
> > > > > > >
> > > > > > > > every different language client will code the special logic.
> > > > > > >
> > > > > > > If other clients follow the behavior of the Java client, they
> > should
> > > > > > > also convert negative schemas to NONE currently. Therefore,
> > changes
> > > > > > > cannot be avoided. No matter if the semantic of `setSchemaType`
> > is
> > > > > > > changed, they should follow the Java implementation as well.
> > > > > > >
> > > > > > > > This will change the meaning of the schema data field
> > > > > > >
> > > > > > > The existing definition only defines its meaning to the AVRO and
> > JSON
> > > > > > > schema. But from a more general view, the schema data should be
> > > > > > > something associated with the current schema. Giving it more
> > meaning
> > > > > > > for other schema types is acceptable IMO. For example, the schema
> > > > data
> > > > > > > field represents the serialized Protobuf descriptor in Protobuf
> > > > Native
> > > > > > > schema, see `ProtobufNativeSchema#of`:
> > > > > >

Re: [DISCUSS] PIP-236: Upload AUTO_CONSUME SchemaType to Broker

2023-01-04 Thread
> It does not affect the public API so it can be cherry-picked into old
> branches. The main difference with this proposal is that my solution
> carries the identity info (i.e. `AUTO_CONSUME`) in the schema data,
> which is a byte array. The negative schema types should not be exposed
> to users. Adding a field to the subscribe request might be okay but it
> could be unnecessary to cover such a corner case.

This will change the meaning of the schema data field and couple the
schema type and schema data. `schema type = NONE` and `schema data =
"AUTO_CONSUME" ` represent `AUTO_ CONSUME`, I think it's weird. Why
can't we upload negative schema types?

> It does not affect the public API
upload negative schema types only changes the proto, if using `schema
type = NONE` and `schema data = "AUTO_CONSUME" `, every different
language client will code the special logic. This special logic can
easily be ignored.

Thanks,
Bo

Yunze Xu  于2023年1月4日周三 17:02写道:
>
> I opened a PR to fix this issue: https://github.com/apache/pulsar/pull/19128
>
> It does not affect the public API so it can be cherry-picked into old
> branches. The main difference with this proposal is that my solution
> carries the identity info (i.e. `AUTO_CONSUME`) in the schema data,
> which is a byte array. The negative schema types should not be exposed
> to users. Adding a field to the subscribe request might be okay but it
> could be unnecessary to cover such a corner case.
>
> It might be controversial if schema data should be used in such a way,
> because the original purpose is to represent the AVRO or JSON
> definition. However, this semantic is defined just for AVRO or JSON
> schema. IMO, the data field of other schemas is never used well.
>
> Another solution is to make use of the name field of schema, which
> might be more natural. I think we can continue the discussion in my
> PR.
>
> Thanks,
> Yunze
>
> On Wed, Jan 4, 2023 at 11:07 AM Yunze Xu  wrote:
> >
> > Modifying the subscribe request is better than exposing AUTO_CONSUME
> > schema type IMO. The negative value of a schema type, like BYTES,
> > AUTO_PRODUCE, means this schema type should only be used internally.
> > Adding the negative enum value to the Schema definition in
> > PulsarApi.proto looks very weird.
> >
> > But I'm still wondering if we can avoid the API changes. I will look
> > deeper into this issue.
> >
> > Thanks,
> > Yunze
> >
> > On Wed, Jan 4, 2023 at 12:12 AM Enrico Olivelli  wrote:
> > >
> > > Il Mar 3 Gen 2023, 14:37 Yunze Xu  ha 
> > > scritto:
> > >
> > > > Hi Bo,
> > > >
> > > > I got it now. The PIP title sounds ambiguous. Using the term "Upload
> > > > xxx SchemaType" sounds like uploading the schema into the registry.
> > > > Instead, it should be "carrying schema in the request when subscribing
> > > > with AUTO_CONSUME schema".
> > > >
> > >
> > >
> > > I agree that we should change the naming and we should probably not use a
> > > new Schema type but add an optional field in the subscribe request (and do
> > > not send it if the broker is an old version)
> > >
> > >
> > > Enrico
> > >
> > >
> > >
> > > > Thanks,
> > > > Yunze
> > > >
> > > > On Tue, Jan 3, 2023 at 4:56 PM 丛搏  wrote:
> > > > >
> > > > > Hi, Yunze
> > > > > > What I am concerned about is that if the old clients with other
> > > > > > schemas (i.e. schema is neither null nor AUTO_CONSUME) subscribe to
> > > > > > the topic with AUTO_CONSUME schema, what will happen?
> > > > >
> > > > > AUTO_CONSUME schema will not store in `SchemaRegistryServiceImpl`, it
> > > > > only represents one consumer with AUTO_CONSUME schema to subscribe to
> > > > > a topic. If old clients with other schemas subscribe to this topic,
> > > > > Its behavior will not be changed by this PIP.
> > > > >
> > > > > > What's the schema compatibility check rule on a topic with
> > > > AUTO_CONSUME schema?
> > > > >
> > > > > it's only the consumer schema compatibility check, not on topic. if a
> > > > > consume with AUTO_CONSUME schema will do any compatibility check
> > > > >
> > > > > Thanks,
> > > > > Bo
> > > > >
> > > > > Yunze Xu  于2023年1月3日周二 10:16写道:
> > > > > >
> > > > > > What I am conce

Re: [VOTE] Pulsar Release 2.11.0 Candidate-5

2023-01-04 Thread
+1 (non-binding)

system: mac os 12.6, Apple M1
maven: 3.8.5
java: OpenJDK 17.0.3

- Checked the signature
- Checked LICENSE
- Start standalone with zookeeper stream storage
- Publish and consume messages
- Verified Function and State Function
- Verified Cassandra connector
- Build from the source package
- Run a simple transaction performance check

Thanks,
Bo

Yunze Xu  于2023年1月4日周三 16:41写道:
>
> Okay, I will give my +1 (binding).
>
> Thanks,
> Yunze
>
> On Wed, Jan 4, 2023 at 4:00 PM PengHui Li  wrote:
> >
> > I think it's ok.
> > 2.11.0 is the first release version of pulsar-shell
> > So it's not a regression that was introduced in 2.11.0 and not
> > critical security issues or license issues which will block users move
> > to the new version.
> >
> > Thanks,
> > Penghui
> >
> >
> >
> > On Wed, Jan 4, 2023 at 3:36 PM Yunze Xu 
> > wrote:
> >
> > > +0 (binding)
> > >
> > > - Checked the signature
> > > - Build from source (Java 17, Ubuntu 20.04 WSL2)
> > > - Start standalone with KoP branch-2.11 (ab9d761f)
> > > - Verify Pulsar client 2.11.0, Kafka client 3.3.1 (through KoP)
> > > - Verify pulsar-shell on Ubuntu 20.04 WSL2 and Windows:
> > > 1. Use `client` command to produce and consume messages
> > > 2. Use `admin` command to create, list and delete topics
> > >
> > > I didn't give +1 because I have to modify the
> > > `bin/pulsar-admin-common.cmd` to make `pulsar-shell` work on Windows.
> > > I left a command here:
> > > https://github.com/apache/pulsar/pull/17243#discussion_r1061199501.
> > > Not sure if it's a blocker so I didn't give +1 or -1 until someone
> > > could explain it.
> > >
> > > Thanks,
> > > Yunze
> > >
> > > On Wed, Jan 4, 2023 at 2:37 PM PengHui Li  wrote:
> > > >
> > > > > I found there are both apache-pulsar-shell-2.11.0-bin.tar.gz and
> > > > apache-pulsar-shell-2.11.0-bin.zip, are they duplicates?
> > > >
> > > > I noticed here is the discussion about this part in Slack
> > > > https://apache-pulsar.slack.com/archives/C5ZSVEN4E/p1672411025574369
> > > >
> > > > Penghui
> > > >
> > > > On Wed, Jan 4, 2023 at 2:35 PM Yunze Xu 
> > > > wrote:
> > > >
> > > > > I found there are both apache-pulsar-shell-2.11.0-bin.tar.gz and
> > > > > apache-pulsar-shell-2.11.0-bin.zip, are they duplicates?
> > > > >
> > > > > Thanks,
> > > > > Yunze
> > > > >
> > > > > On Mon, Jan 2, 2023 at 8:01 PM guo jiwei  wrote:
> > > > > >
> > > > > > This is the fifth release candidate for Apache Pulsar, version
> > > 2.11.0.
> > > > > >
> > > > > > This release contains 1617 commits by 61 contributors.
> > > > > >
> > > https://github.com/apache/pulsar/compare/v2.10.2...v2.11.0-candidate-5
> > > > > >
> > > > > > CI for this release candidate
> > > > > > https://github.com/Technoboy-/pulsar/pull/26
> > > > > >
> > > > > > *** Please download, test and vote on this release. This vote will
> > > stay
> > > > > open
> > > > > > for at least 72 hours ***
> > > > > >
> > > > > > Note that we are voting upon the source (tag), binaries are provided
> > > for
> > > > > > convenience.
> > > > > >
> > > > > > Source and binary files:
> > > > > >
> > > https://dist.apache.org/repos/dist/dev/pulsar/pulsar-2.11.0-candidate-5
> > > > > >
> > > > > > SHA-512 checksums:
> > > > > >
> > > > > >
> > > > >
> > > 5bf77f600ac23c7ec72696655a49801e0755d67fb4c59c7ab30e9449b061aacef61ca227f106e6b7f5f490fc2ac57412ac2af3ff064ba1e6a31c162d4ca2ba6d
> > > > > >
> > > > > >  ./apache-pulsar-2.11.0-bin.tar.gz
> > > > > >
> > > > > >
> > > > >
> > > 0a3cab04b4a95ced3745d8b41fc3fa2632c91f77cb10e5c7e008b09ba2f279fc14de109626a44a9482b5354c5fa17bbc07ee369a32a252e358298e494c0fc1cb
> > > > > >
> > > > > >  ./apache-pulsar-2.11.0-src.tar.gz
> > > > > >
> > > > > > Maven staging repo:
> > > > > >
> > > https://repository.apache.org/content/repositories/orgapachepulsar-1203/
> > > > > >
> > > > > > The tag to be voted upon:
> > > > > > v2.11.0-candidate-5 (97ee1a114aa1e5df05d86a9dd47f7fe7a2194211)
> > > > > > https://github.com/apache/pulsar/releases/tag/v2.11.0-candidate-5
> > > > > >
> > > > > > Pulsar's KEYS file containing PGP keys we use to sign the release:
> > > > > > https://dist.apache.org/repos/dist/dev/pulsar/KEYS
> > > > > >
> > > > > >
> > > > >
> > > https://hub.docker.com/layers/mattison/pulsar-all/2.11.0-rc5/images/sha256-548a518bb18890ddf8e48f0bad62695cfc4f743b9e1223ca45b2e03dba6e9149
> > > > > >
> > > > >
> > > https://hub.docker.com/layers/mattison/pulsar/2.11.0-rc5/images/sha256-3b755fb67d49abeb7ab6a76b7123cc474375e3881526db26f43c8cfccdaa3cf6
> > > > > >
> > > > > > Please download the source package, and follow the
> > > > > > release-candidate-validation doc to build
> > > > > > and run the Pulsar standalone service.
> > > > > > https://pulsar.apache.org/contribute/validate-release-candidate
> > > > > >
> > > > > > Since the metadata store is changed from ZK to RocksDB, the
> > > verification
> > > > > of
> > > > > > the `stateful functions` needs to set the parameter "export
> > > > > > PULSAR_STANDALONE_USE_ZOOKEEPER=1"
> > 

[ANNOUNCE] Apache Pulsar 2.9.4 released

2023-01-03 Thread
The Apache Pulsar team is proud to announce Apache Pulsar version 2.9.4.

Pulsar is a highly scalable, low latency messaging platform running on
commodity hardware. It provides simple pub-sub semantics over topics,
guaranteed at-least-once delivery of messages, automatic cursor management for
subscribers, and cross-datacenter replication.

For Pulsar release details and downloads, visit:
https://pulsar.apache.org/download

Release Notes are at:https://pulsar.apache.org/release-notes/

We would like to thank the contributors that made the release possible.

Regards,

The Pulsar Team


Re: [DISCUSS] PIP-236: Upload AUTO_CONSUME SchemaType to Broker

2023-01-03 Thread
+1 (non-binding)

Thanks,
Bo

丛搏  于2023年1月3日周二 16:56写道:
>
> Hi, Yunze
> > What I am concerned about is that if the old clients with other
> > schemas (i.e. schema is neither null nor AUTO_CONSUME) subscribe to
> > the topic with AUTO_CONSUME schema, what will happen?
>
> AUTO_CONSUME schema will not store in `SchemaRegistryServiceImpl`, it
> only represents one consumer with AUTO_CONSUME schema to subscribe to
> a topic. If old clients with other schemas subscribe to this topic,
> Its behavior will not be changed by this PIP.
>
> > What's the schema compatibility check rule on a topic with AUTO_CONSUME 
> > schema?
>
> it's only the consumer schema compatibility check, not on topic. if a
> consume with AUTO_CONSUME schema will do any compatibility check
>
> Thanks,
> Bo
>
> Yunze Xu  于2023年1月3日周二 10:16写道:
> >
> > What I am concerned about is that if the old clients with other
> > schemas (i.e. schema is neither null nor AUTO_CONSUME) subscribe to
> > the topic with AUTO_CONSUME schema, what will happen? What's the
> > schema compatibility check rule on a topic with AUTO_CONSUME schema?
> >
> > Thanks,
> > Yunze
> >
> > On Mon, Jan 2, 2023 at 12:38 AM SiNan Liu  wrote:
> > >
> > > 1.Schema.Type and org.apache.pulsar.common.schema.SchemaType value should
> > > be the same.
> > > 2.These changes do not affect produce and are only affect consumer
> > > subscribe behavior.
> > > 3.backward compatibility:
> > > (1)In org.apache.pulsar.broker.service.ServerCnx#handleSubscribe.
> > > if (schema != null && schema.getType() != SchemaType.AUTO_CONSUME) {
> > > return topic.addSchemaIfIdleOrCheckCompatible(schema)
> > > .thenCompose(v -> topic.subscribe(option));
> > > } else {
> > > return topic.subscribe(option);
> > > }
> > > For the older pulsar client, the schema is null if AUTO_CONSUME consumer
> > > subscribe to the Topic.
> > > For the new pulsar client, if AUTO_CONSUME consumer subscribe the Topic,
> > > then schema is not null and schema.getType() = SchemaType.AUTO_CONSUME.
> > > Both new and old pulsar clients consume the topic, will return topic.
> > > subscribe(option).
> > >
> > > (2)In org.apache.pulsar.broker.service.persistent.PersistentTopic
> > > #addSchemaIfIdleOrCheckCompatible.
> > > @Override
> > > public CompletableFuture addSchemaIfIdleOrCheckCompatible(SchemaData
> > > schema) {
> > > return hasSchema().thenCompose((hasSchema) -> {
> > > int numActiveConsumersWithoutAutoSchema = subscriptions.values().stream()
> > > .mapToInt(subscription -> subscription.getConsumers().stream()
> > > .filter(consumer -> consumer.getSchemaType() != SchemaType.AUTO_CONSUME)
> > > .toList().size())
> > > .sum();
> > > if (hasSchema
> > > || (!producers.isEmpty())
> > > || (numActiveConsumersWithoutAutoSchema != 0)
> > > || (ledger.getTotalSize() != 0)) {
> > > return checkSchemaCompatibleForConsumer(schema);
> > > } else {
> > > return addSchema(schema).thenCompose(schemaVersion ->
> > > CompletableFuture.completedFuture(null));
> > > }
> > > });
> > > }
> > > Only in one case will there be a bug.
> > > First, the old pulsar client consume the empty topic, the consumer schema
> > > is AUTO_CONSUME, and then whether the new or old pulsar client 
> > > consume(i.e.
> > > schema is AVRO) the topic.
> > > The broker will return the error message as IncompatibleSchemaException ("
> > > Topic does not have a schema to check "). The bug at issue17354 is not
> > > fixed in this case.
> > > All the other cases will be normal.
> > >
> > > Yunze Xu  于2022年12月31日周六 20:23写道:
> > >
> > > > Defining `AutoConsume` as -3 is somehow strange. Could you clarify if
> > > > backward compatibility is guaranteed? i.e. if the new Pulsar client
> > > > uploaded the AUTO_CONSUME schema to the broker, can the old Pulsar
> > > > clients produce or consume the same topic anymore?
> > > >
> > > > Thanks,
> > > > Yunze
> > > >
> > > > On Fri, Dec 30, 2022 at 11:32 PM 思楠刘  wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > I made a PIP to discuss: 
> > > > > https://github.com/apache/pulsar/issues/19113.
> > > > >
> > > > > Thanks,
> > > > > Sinan
> > > >


Re: [DISCUSS] PIP-236: Upload AUTO_CONSUME SchemaType to Broker

2023-01-03 Thread
Hi, Yunze
> What I am concerned about is that if the old clients with other
> schemas (i.e. schema is neither null nor AUTO_CONSUME) subscribe to
> the topic with AUTO_CONSUME schema, what will happen?

AUTO_CONSUME schema will not store in `SchemaRegistryServiceImpl`, it
only represents one consumer with AUTO_CONSUME schema to subscribe to
a topic. If old clients with other schemas subscribe to this topic,
Its behavior will not be changed by this PIP.

> What's the schema compatibility check rule on a topic with AUTO_CONSUME 
> schema?

it's only the consumer schema compatibility check, not on topic. if a
consume with AUTO_CONSUME schema will do any compatibility check

Thanks,
Bo

Yunze Xu  于2023年1月3日周二 10:16写道:
>
> What I am concerned about is that if the old clients with other
> schemas (i.e. schema is neither null nor AUTO_CONSUME) subscribe to
> the topic with AUTO_CONSUME schema, what will happen? What's the
> schema compatibility check rule on a topic with AUTO_CONSUME schema?
>
> Thanks,
> Yunze
>
> On Mon, Jan 2, 2023 at 12:38 AM SiNan Liu  wrote:
> >
> > 1.Schema.Type and org.apache.pulsar.common.schema.SchemaType value should
> > be the same.
> > 2.These changes do not affect produce and are only affect consumer
> > subscribe behavior.
> > 3.backward compatibility:
> > (1)In org.apache.pulsar.broker.service.ServerCnx#handleSubscribe.
> > if (schema != null && schema.getType() != SchemaType.AUTO_CONSUME) {
> > return topic.addSchemaIfIdleOrCheckCompatible(schema)
> > .thenCompose(v -> topic.subscribe(option));
> > } else {
> > return topic.subscribe(option);
> > }
> > For the older pulsar client, the schema is null if AUTO_CONSUME consumer
> > subscribe to the Topic.
> > For the new pulsar client, if AUTO_CONSUME consumer subscribe the Topic,
> > then schema is not null and schema.getType() = SchemaType.AUTO_CONSUME.
> > Both new and old pulsar clients consume the topic, will return topic.
> > subscribe(option).
> >
> > (2)In org.apache.pulsar.broker.service.persistent.PersistentTopic
> > #addSchemaIfIdleOrCheckCompatible.
> > @Override
> > public CompletableFuture addSchemaIfIdleOrCheckCompatible(SchemaData
> > schema) {
> > return hasSchema().thenCompose((hasSchema) -> {
> > int numActiveConsumersWithoutAutoSchema = subscriptions.values().stream()
> > .mapToInt(subscription -> subscription.getConsumers().stream()
> > .filter(consumer -> consumer.getSchemaType() != SchemaType.AUTO_CONSUME)
> > .toList().size())
> > .sum();
> > if (hasSchema
> > || (!producers.isEmpty())
> > || (numActiveConsumersWithoutAutoSchema != 0)
> > || (ledger.getTotalSize() != 0)) {
> > return checkSchemaCompatibleForConsumer(schema);
> > } else {
> > return addSchema(schema).thenCompose(schemaVersion ->
> > CompletableFuture.completedFuture(null));
> > }
> > });
> > }
> > Only in one case will there be a bug.
> > First, the old pulsar client consume the empty topic, the consumer schema
> > is AUTO_CONSUME, and then whether the new or old pulsar client consume(i.e.
> > schema is AVRO) the topic.
> > The broker will return the error message as IncompatibleSchemaException ("
> > Topic does not have a schema to check "). The bug at issue17354 is not
> > fixed in this case.
> > All the other cases will be normal.
> >
> > Yunze Xu  于2022年12月31日周六 20:23写道:
> >
> > > Defining `AutoConsume` as -3 is somehow strange. Could you clarify if
> > > backward compatibility is guaranteed? i.e. if the new Pulsar client
> > > uploaded the AUTO_CONSUME schema to the broker, can the old Pulsar
> > > clients produce or consume the same topic anymore?
> > >
> > > Thanks,
> > > Yunze
> > >
> > > On Fri, Dec 30, 2022 at 11:32 PM 思楠刘  wrote:
> > > >
> > > > Hi all,
> > > >
> > > > I made a PIP to discuss: https://github.com/apache/pulsar/issues/19113.
> > > >
> > > > Thanks,
> > > > Sinan
> > >


Re: [ANNOUNCE] Yunze Xu as a new PMC member in Apache Pulsar

2022-12-29 Thread
Congratulations! Yunze

Thanks,
Bo

Max Xu  于2022年12月29日周四 20:45写道:
>
> Congratulations! Yunze
>
> Best,
> Max Xu
>
>
> On Thu, Dec 29, 2022 at 8:42 PM Haiting Jiang 
> wrote:
>
> > Hi all,
> >
> > The Apache Pulsar Project Management Committee (PMC) has invited Yunze Xu
> > (https://github.com/BewareMyPower) as a member of the PMC and we are
> > pleased to announce that he has accepted.
> >
> > He is very active in the community in the past few years and made a lot of
> > great contributions.
> >
> > Welcome Yunze to the Apache Pulsar PMC.
> >
> > Best Regards,
> > Haiting Jiang on behalf of the Pulsar PMC
> >


Re: [DISCUSS] Reject create non-existent persistent partitions.

2022-12-28 Thread
Hi Mattison,

> Should we allow the user to create the non-partitioned topic name like 
> `persistent://tenant/namespace/localname-partition-0`?

I think we should disallow creation. This will cause the partition
metadata to be incorrect.

If the current behavior is to allow the creation, modifying it would
be a breaking change. We include this modification in the next
version, no need to cherry-pick to the old branch

Thanks,
Bo



 于2022年12月29日周四 13:33写道:
>
> Hi, All
>
> I have another question that needs to discuss.
>
> Should we allow the user to create the non-partitioned topic name like 
> `persistent://tenant/namespace/localname-partition-0`?
>
> If so, this is a little confusing with the partitioned topic.
>
> e.g.:
> TopicName#isPartitioned method.
>
> Best,
> Mattison
> On Dec 28, 2022, 12:43 +0800, mattisonc...@gmail.com, wrote:
> > Hi, All
> >
> > I'd like to start a discussion of this behaviour change as follow.
> >
> > The issue is described here:  https://github.com/apache/pulsar/issues/19085
> > And the fix PR here:  https://github.com/apache/pulsar/pull/19086
> >
> > ---
> >
> > Behaviour change:
> >
> > Before: we can create non-existent persistent partitions.
> >
> > After: we will get `PulsarClientException.TopicDoesNotExistException` when 
> > we create non-existent persistent partitions.
> >
> > Please feel free to leave comments if you have any concerns.
> >
> > Best,
> > Mattison


Re: [VOTE] Pulsar Release 2.9.4 Candidate 3

2022-12-28 Thread
Thank you all,

Close the vote with 3  bindings(PengHui, Haiting, Hang), and 4
non-bindings(Xiangying, Yunze).

I will continue the release process.

Thanks,
Bo

Hang Chen  于2022年12月28日周三 18:29写道:

>
> +1 (binding)
>
> Verified
>  - Checksum and signatures
>  - Build from source with JDK8 and maven 3.8.6
>  - Checked BookKeeper so lib
>  - Start standalone cluster and run pulsar perf produce and consume
>  - Run pulsar-lakehouse-connectors based on this release
>
> Thanks,
> Hang
>
> Haiting Jiang  于2022年12月25日周日 16:43写道:
> >
> > +1 binding
> >
> > - Checksum and signatures
> > - Built from sources with JDK11
> > - Run Pulsar standalone
> > - Validate Pub/Sub and Java Functions
> > - Validate Connectors
> > - Validate Stateful Functions
> > - Run a simple performance check
> >
> > Thanks,
> > Haiting
> >
> > On Sun, Dec 25, 2022 at 3:25 PM Haiting Jiang  
> > wrote:
> > >
> > > Hi Tison and Enrico,
> > >
> > > Thanks for your information.
> > >
> > > Hi Bo,
> > >
> > > > I look at some discussions, 
> > > > https://github.com/apache/pulsar/issues/12166.
> > > > Maybe we need to upgrade the version of the bookkeeper, I am not sure
> > > > whether we should upgrade the bookkeeper in branch-2.9.
> > >
> > > From the discussion in the issue, I think we should not add support
> > > for Apple M1 in old branches.
> > > We don't do major version upgrades of BK in pulsar minor versions.
> > >
> > > And I will continue to verify this version on the Intel chips based 
> > > laptops.
> > >
> > > Thanks,
> > > Haiting
> > >
> > > On Sat, Dec 24, 2022 at 8:35 PM Enrico Olivelli  
> > > wrote:
> > > >
> > > > Il Sab 24 Dic 2022, 11:44 丛搏  ha scritto:
> > > >
> > > > > thanks for the information
> > > > >
> > > > > Thanks,
> > > > > Bo
> > > > >
> > > > > tison  于2022年12月24日周六 18:25写道:
> > > > > >
> > > > > > Well it's easy to find:
> > > > > > https://github.com/apache/pulsar/issues/12166#issuecomment-1237601981
> > > > > >
> > > > > > tison 于2022年12月24日 周六18:22写道:
> > > > > >
> > > > > > > Hi Haiting,
> > > > > > >
> > > > > > > I think it's 2.11. You can search the issue on this error 
> > > > > > > message. I
> > > > > > > remember I refer to such one previously.
> > > > > > >
> > > > > > > Sorry I'm outing so cannot do the search for you.
> > > > > > >
> > > > > > > Haiting Jiang 于2022年12月24日 周六17:20写道:
> > > > > > >
> > > > > > >> Hi Bo,
> > > > > > >>
> > > > > > >> I started standalone failed with the following errors on M1 mac,
> > > > > > >> ```
> > > > > > >> 2022-12-24T17:08:58,944+0800 [main] ERROR
> > > > > > >> org.apache.pulsar.PulsarStandaloneStarter - Failed to start 
> > > > > > >> pulsar
> > > > > > >> service.
> > > > > > >> java.io.IOException: Failed to load RocksDB JNI library
> > > > > > >> at
> > > > > > >>
> > > > > org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.(KeyValueStorageRocksDB.java:98)
> > > > > > >> ~[org.apache.bookkeeper-bookkeeper-server-4.14.5.jar:4.14.5]
> > > > > > >> at
> > > > > > >>
> > > > > org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.(KeyValueStorageRocksDB.java:89)
> > > > > > >> ~[org.apache.bookkeeper-bookkeeper-server-4.14.5.jar:4.14.5]
> > > > > > >> at
> > > > > > >>
> > > > > org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.lambda$static$0(KeyValueStorageRocksDB.java:63)
> > > > > > >> ~[org.apache.bookkeeper-bookkeeper-server-4.14.5.jar:4.14.5]
> > > > > > >> at
> > > > > > >>
> > > > > org.apache.bookkeeper.bookie.storage.ldb.LedgerMetadataIndex.(LedgerMetadataIndex.java:68)
> > > > > > >> ~[org.apache.bookkeeper-bookkeeper-server-4.14.5.jar:4.14.5]
> > > 

Re: [DISCUSS] Reject create non-existent persistent partitions.

2022-12-28 Thread
Hi Mattison,

I'm not sure if this is the current behavior, I left a comment in the PR

Thanks,
Bo

Enrico Olivelli  于2022年12月28日周三 16:27写道:
>
> I agree with you.
>
> Please note that the new test case is about non-persistent topics
>
> is it expected ?
>
> Enrico
>
> Il giorno mer 28 dic 2022 alle ore 07:58 Yubiao Feng
>  ha scritto:
> >
> > Hi qiang
> >
> > I think this is a necessary fix, and it would be nice if more explicit
> > errors were given to the client.
> >
> > Thanks
> > Yubiao
> >
> > On Wed, Dec 28, 2022 at 12:43 PM  wrote:
> >
> > > Hi, All
> > >
> > > I'd like to start a discussion of this behaviour change as follow.
> > >
> > > The issue is described here:
> > > https://github.com/apache/pulsar/issues/19085
> > > And the fix PR here:  https://github.com/apache/pulsar/pull/19086
> > >
> > > ---
> > >
> > > Behaviour change:
> > >
> > > Before: we can create non-existent persistent partitions.
> > >
> > > After: we will get `PulsarClientException.TopicDoesNotExistException` when
> > > we create non-existent persistent partitions.
> > >
> > > Please feel free to leave comments if you have any concerns.
> > >
> > > Best,
> > > Mattison
> > >


Re: [VOTE] Pulsar Release 2.10.3 Candidate 1

2022-12-27 Thread
+1 (non-binding)

system: mac os 12.3.1, Intel
maven: 3.6.1
java: OpenJDK 17.0.1

- Checked the signature
- Checked LICENSE
- Start standalone
- Publish and consume messages
- Verified Function and State Function
- Verified Cassandra connector
- Build from the source package
- Run a simple transaction performance check

Thanks,
Bo

Xiangying Meng  于2022年12月25日周日 20:42写道:
>
> This is the third release candidate for Apache Pulsar, version 2.10.3.
>
> This release contains 155 commits by 50 contributors.
> https://github.com/apache/pulsar/compare/v2.10.2...v2.10.3-candidate-1
>
> *** Please download, test and vote on this release. This vote will stay open
> for at least 72 hours ***
>
> Note that we are voting upon the source (tag), binaries are provided for
> convenience.
>
> Source and binary files:
> https://dist.apache.org/repos/dist/dev/pulsar/pulsar-2.10.3-candidate-1/
>
> SHA-512 checksums:
> 64518096acf4c2a5ef1dcc936cd645217291254cd5c18337a743db5b4caa70a48cfc969643fd18a16ba24421952155b597e1b84be997447fe21f0b12a0555cb1
> apache-pulsar-2.10.3-bin.tar.gz
> ee542d64d4aa288200c06f42d71186e8797480263ab84aaeb50ac683d6ea675c298adf8207b3aa98dae378b9fc84e9ba3dc78902397a774d1756d5e1739ab475
> apache-pulsar-2.10.3-src.tar.gz
>
> Maven staging repo:
> https://repository.apache.org/content/repositories/orgapachepulsar-1201/
>
> The tag to be voted upon:
> v2.10.3-candidate-1 (b69f4efa6058c3f51885a61a2b3acb46f8b730f4)
> https://github.com/apache/pulsar/releases/tag/v2.10.3-candidate-1
>
> Pulsar's KEYS file containing PGP keys you use to sign the release:
> https://dist.apache.org/repos/dist/dev/pulsar/KEYS
>
> Docker images:
>
> 
> https://hub.docker.com/layers/xiangyingmeng/pulsar/2.10.3/images/sha256-9034eca8a61c7edc6d5b3fb5aa214f2dbb45f89d00c4ac875931ca588645dc96?context=repo
>
> 
> https://hub.docker.com/layers/xiangyingmeng/pulsar-all/2.10.3/images/sha256-c7a0323bf59f20ae29b362815302e272109453b8210f130a1daaa7b38918f884?context=repo
>
> Please download the source package, and follow the README to build
> and run the Pulsar standalone service.


Re: [VOTE] Pulsar Release 2.11.0 Candidate-4

2022-12-27 Thread
+1 (non-binding)

system: mac os 12.6, Apple M1
maven: 3.8.5
java: OpenJDK 17.0.3

- Checked the signature
- Checked LICENSE
- Start standalone with zookeeper stream storage
- Publish and consume messages
- Verified Function and State Function
- Verified Cassandra connector
- Build from the source package
- Run a simple transaction performance check

Thanks,
Bo


Re: [VOTE] Pulsar Release 2.9.4 Candidate 3

2022-12-24 Thread
On Wed, Dec 21, 2022 at 7:17 PM Yunze Xu 
> >> wrote:
> >> >
> >> > +1 (non-binding)
> >> >
> >> > - Checked the signature
> >> > - Build from source
> >> > - Start standalone with KoP 2.9.3.20
> >> > - Verified Pulsar client 2.9.4, master (05e6f5e3), and Kafka clients
> >> > 3.3.1 (through KoP)
> >> >
> >> > Thanks,
> >> > Yunze
> >> >
> >> > On Mon, Dec 19, 2022 at 11:13 AM Xiangying Meng 
> >> wrote:
> >> > >
> >> > > +1 (non-binding)
> >> > >
> >> > > - Start standalone
> >> > > - Validate Pub/Sub and Java Functions
> >> > > - Validate Cassandra connector
> >> > > - Validate Stateful Functions
> >> > >
> >> > > Thanks,
> >> > > Xiangying
> >> > >
> >> > > On Thu, Dec 15, 2022 at 10:25 AM PengHui Li 
> >> wrote:
> >> > >
> >> > > > +1 (binding)
> >> > > >
> >> > > > - Checked the signature
> >> > > > - Start standalone
> >> > > > - Publish and consume messages
> >> > > > - Verified Function and State Function
> >> > > > - Verified Cassandra connector
> >> > > > - Build from the source package
> >> > > >
> >> > > > Thanks,
> >> > > > Penghui
> >> > > >
> >> > > > On Tue, Dec 13, 2022 at 7:49 PM 丛搏  wrote:
> >> > > >
> >> > > > > This is the third release candidate for Apache Pulsar, version
> >> 2.9.4.
> >> > > > >
> >> > > > >
> >> > > > > This release contains 319 commits by 69 contributors.
> >> > > > >
> >> https://github.com/apache/pulsar/compare/v2.9.3...v2.9.4-candidate-3
> >> > > > >
> >> > > > > *** Please download, test and vote on this release. This vote
> >> will stay
> >> > > > > open
> >> > > > > for at least 72 hours ***
> >> > > > >
> >> > > > > Note that we are voting upon the source (tag), binaries are
> >> provided for
> >> > > > > convenience.
> >> > > > >
> >> > > > > Source and binary files:
> >> > > > >
> >> https://dist.apache.org/repos/dist/dev/pulsar/pulsar-2.9.4-candidate-3/
> >> > > > >
> >> > > > > SHA-512 checksums:
> >> > > > >
> >> > > > >
> >> > > >
> >> 85cd920c8fedcec2551867e1ea89052c8578634e95226f92c4114d17587e7d2821f8033ef6fc70103e0b21dd3f8f9b907c68209cdc2cb74eca08f0a3ae6bd98c
> >> > > > >  apache-pulsar-2.9.4-bin.tar.gz
> >> > > > >
> >> > > > >
> >> > > >
> >> da6ee53ffc66e4d9f60c74935c3ed0d85b26f5a629cb50fdfc02f535d66492297932256e4e44c8d4a08d20a85c4f490b7d7b3e169756bc246690bedfe582892b
> >> > > > >  apache-pulsar-2.9.4-src.tar.gz
> >> > > > >
> >> > > > > Maven staging repo:
> >> > > > >
> >> https://repository.apache.org/content/repositories/orgapachepulsar-1198/
> >> > > > >
> >> > > > > The tag to be voted upon:
> >> > > > > v2.9.4-candidate-3 (e949f18a20c6f8f5b6f326cd4afb814d0fb3b8be)
> >> > > > > https://github.com/apache/pulsar/releases/tag/v2.9.4-candidate-3
> >> > > > >
> >> > > > > Pulsar's KEYS file containing PGP keys you use to sign the
> >> release:
> >> > > > > https://dist.apache.org/repos/dist/dev/pulsar/KEYS
> >> > > > >
> >> > > > > Docker images:
> >> > > > >
> >> > > > > 
> >> > > > >
> >> > > > >
> >> > > >
> >> https://hub.docker.com/layers/congbobo184/pulsar/2.9.4/images/sha256-72272e9b7ce5c568575bacbddf7565fd570d27b486f2f47cafaa0938ec56e1ef
> >> > > > >
> >> > > > > 
> >> > > > >
> >> > > > >
> >> > > >
> >> https://hub.docker.com/layers/congbobo184/pulsar-all/2.9.4/images/sha256-c17d42831a882028996627abe56e71e067b905fdaac91ca3bdc933d51ce5b73b
> >> > > > >
> >> > > > >
> >> > > > > Please download the source package, and follow the README to build
> >> > > > > and run the Pulsar standalone service.
> >> > > > >
> >> > > >
> >>
> > --
> > Best,
> > tison.
> >
> --
> Best,
> tison.


Re: [VOTE] Pulsar Release 2.9.4 Candidate 3

2022-12-24 Thread
ng
> > >
> > > On Thu, Dec 15, 2022 at 10:25 AM PengHui Li  wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > - Checked the signature
> > > > - Start standalone
> > > > - Publish and consume messages
> > > > - Verified Function and State Function
> > > > - Verified Cassandra connector
> > > > - Build from the source package
> > > >
> > > > Thanks,
> > > > Penghui
> > > >
> > > > On Tue, Dec 13, 2022 at 7:49 PM 丛搏  wrote:
> > > >
> > > > > This is the third release candidate for Apache Pulsar, version 2.9.4.
> > > > >
> > > > >
> > > > > This release contains 319 commits by 69 contributors.
> > > > > https://github.com/apache/pulsar/compare/v2.9.3...v2.9.4-candidate-3
> > > > >
> > > > > *** Please download, test and vote on this release. This vote will 
> > > > > stay
> > > > > open
> > > > > for at least 72 hours ***
> > > > >
> > > > > Note that we are voting upon the source (tag), binaries are provided 
> > > > > for
> > > > > convenience.
> > > > >
> > > > > Source and binary files:
> > > > > https://dist.apache.org/repos/dist/dev/pulsar/pulsar-2.9.4-candidate-3/
> > > > >
> > > > > SHA-512 checksums:
> > > > >
> > > > >
> > > > 85cd920c8fedcec2551867e1ea89052c8578634e95226f92c4114d17587e7d2821f8033ef6fc70103e0b21dd3f8f9b907c68209cdc2cb74eca08f0a3ae6bd98c
> > > > >  apache-pulsar-2.9.4-bin.tar.gz
> > > > >
> > > > >
> > > > da6ee53ffc66e4d9f60c74935c3ed0d85b26f5a629cb50fdfc02f535d66492297932256e4e44c8d4a08d20a85c4f490b7d7b3e169756bc246690bedfe582892b
> > > > >  apache-pulsar-2.9.4-src.tar.gz
> > > > >
> > > > > Maven staging repo:
> > > > > https://repository.apache.org/content/repositories/orgapachepulsar-1198/
> > > > >
> > > > > The tag to be voted upon:
> > > > > v2.9.4-candidate-3 (e949f18a20c6f8f5b6f326cd4afb814d0fb3b8be)
> > > > > https://github.com/apache/pulsar/releases/tag/v2.9.4-candidate-3
> > > > >
> > > > > Pulsar's KEYS file containing PGP keys you use to sign the release:
> > > > > https://dist.apache.org/repos/dist/dev/pulsar/KEYS
> > > > >
> > > > > Docker images:
> > > > >
> > > > > 
> > > > >
> > > > >
> > > > https://hub.docker.com/layers/congbobo184/pulsar/2.9.4/images/sha256-72272e9b7ce5c568575bacbddf7565fd570d27b486f2f47cafaa0938ec56e1ef
> > > > >
> > > > > 
> > > > >
> > > > >
> > > > https://hub.docker.com/layers/congbobo184/pulsar-all/2.9.4/images/sha256-c17d42831a882028996627abe56e71e067b905fdaac91ca3bdc933d51ce5b73b
> > > > >
> > > > >
> > > > > Please download the source package, and follow the README to build
> > > > > and run the Pulsar standalone service.
> > > > >
> > > >


Re: [DISCUSS] The use of consumer redeliverUnacknowledgedMessages method

2022-12-23 Thread
Hi, Baodi,

>Alternatively, can it be understood that when a user wants to process messages 
>in an orderly manner, he cannot call the ‘reconumserLater` and 
>'negativeAcknowledge' >methods?

Yes, I think so. This is what the user needs to think about. We need
to provide use cases, and notes for users on what they should do will
can make messages in order

Thanks,
Bo

Baodi Shi  于2022年12月23日周五 16:24写道:
>
> Hi, Congbo:
>
>
> > In the above two solutions, it can keep messages in order. but in the
> > first solution, we don't know how many messages the user process and
> > then cumulative once. If the message numbers are 1000, maybe the
> > user can't store the message in the memory for reprocessing. so users
> > need a method to redeliver these messages.
>
>
> I agree to add the rewind interface.
>
>
> > Failover also can be individual ack, so we can't disable
> > `reconumserLate`r and `negativeAcknowledge`.
>
> In the Failover subscription model, What is the individual ack scenario?
>
> Alternatively, can it be understood that when a user wants to process 
> messages in an orderly manner, he cannot call the ‘reconumserLater` and 
> 'negativeAcknowledge' methods?
>
>
> > 2022年12月23日 11:30,丛搏  写道:
> >
> > Hi, Asaf, Baodi:
> >
> > I'm very sorry for my late reply. Thanks for your discussion.
> >
> >> - receive() - get the following message
> >>  - cumulativeAck(msgId) - acknowledge all messages up to msgId.
> >>- Maybe we can try to come up with a self-explanatory name like
> >>ackAllUpTo(msgId).
> >
> > If the user wants the messages in order, the `receive ()` and
> > `cumulativeAck ()` must be in a single thread. Otherwise, the
> > `cumulativeAck` will lose its meaning.
> >
> > If users use cumulative ack code like:
> > ```
> > while (true) {
> >Message message = consumer.receive();
> >process(message);
> >consumer.acknowledgeCumulative(message.getMessageId());
> > }
> > ```
> > I think it is not a good way for users to use `acknowledgeCumulative
> > `. because one message doesn't need `cumulativeAck`, it's meaningless.
> > They use `acknowledgeCumulative ` should like code:
> > ```
> > while (true) {
> >Messages messages = consumer.batchReceive();
> >process(messages);
> >consumer.acknowledgeCumulative(messages);
> > }
> > ```
> > then we should think about when `process(messages);` throw any
> > exception, the user how to reprocess this message.
> >
> > 1. one case is the user reprocess these messages, the
> > `process(messages)` code like:
> > ```
> > private void process(Messages messages) {
> >try {
> >// so something
> >} catch (Exception e) {
> >process(messages);
> >}
> > };
> > ```
> > in this way, the consumer doesn't need to do anything
> >
> > 2. pulsar rewind the cursor, and redeliver these messages
> >
> > ```
> >while (true) {
> >Messages messages = consumer.batchReceive();
> >try {
> >process(messages);
> >} catch (Exception e) {
> >
> >consumer.rewind(); // this method can redeliver the
> > messages, whatever the name of this method. before this method
> > succeeds, the consumer can't invoke consumer.batchReceive() again.
> > continue;
> >}
> >consumer.acknowledgeCumulative(messages);
> >}
> > ```
> > int this way, the consumer needs a method that can redeliver these
> > messages, `redeliverUnacknowledgedMessages` is an async method that
> > can't guarantee the messages are in order. so we need a new method,
> > and it is a sync method.
> > <<<<<<<<<<<<<
> >
> > In the above two solutions, it can keep messages in order. but in the
> > first solution, we don't know how many messages the user process and
> > then cumulative once. If the message numbers are 1000, maybe the
> > user can't store the message in the memory for reprocessing. so users
> > need a method to redeliver these messages.
> >
> > < I think we should disable nack under Exclusive/Failover subscription.
> >
> > Failover also can be individual ack, so we can't disable
> > `reconumserLate`r and `negativeAcknowledge`.
> >
> > Thanks,
> > Bo
> >
> > Asaf Mesika  于2022年12月18日周日 18:36写道:
> >>
> >> Hi Baodi,
> >>
> >> Yes, in effect, I 

Re: [DISCUSS] Move GitHub discussion notifications to commits@

2022-12-22 Thread
+1 for pointing to the user list.

Thanks,
Bo

Dave Fisher  于2022年12月23日周五 12:48写道:
>
>
>
> Sent from my iPhone
>
> > On Dec 22, 2022, at 7:00 PM, PengHui Li  wrote:
> >
> > Is it better to point to user@ list?
> > I noticed that most discussions are user-facing issues(Requirements,
> > Questions)
>
> I agree.
>
> In addition, we should discuss in another thread how to guide user and dev 
> discussions. Guide where each occurs.
>
> Regards,
> Dave
> >
> > Thanks,
> > Penghui
> >
> >> On Fri, Dec 23, 2022 at 10:18 AM Yunze Xu 
> >> wrote:
> >>
> >> +1
> >>
> >> Thanks,
> >> Yunze
> >>
> >>> On Fri, Dec 23, 2022 at 6:04 AM Michael Marshall 
> >>> wrote:
> >>>
> >>> Hello Pulsar Community,
> >>>
> >>> We've had GitHub discussions enabled for a while now. I think they're
> >>> a helpful way for users to ask questions. I also think they are
> >>> creating too many notifications to the dev@ list, which can make it
> >>> hard to keep up with PIPs, VOTEs, and other dev discussions.
> >>>
> >>> I propose we move these notifications to the commits@ list. This
> >>> change will make all GitHub notifications go to commits@.
> >>>
> >>> Any user/contributor that would like to receive discussion
> >>> notifications can subscribe via GitHub or via the commits mailing
> >>> list.
> >>>
> >>> Here is the relevant PR: https://github.com/apache/pulsar/pull/19033
> >>>
> >>> What are your thoughts? If there are no objections in the next week, I
> >>> propose we merge the PR.
> >>>
> >>> Thanks,
> >>> Michael
> >>
>


Re: [DISCUSS] The use of consumer redeliverUnacknowledgedMessages method

2022-12-22 Thread
>
> >
> > 2022年11月30日 00:19,Asaf Mesika  写道:
> >
> >
> > Ok, I'll try to summarize what I read here to make sure we're all on the
> >
> > same page :)
> >
> >
> > Exclusive and Failover subscription types are subscriptions that guarantee
> >
> > two things:
> >
> > 1. Single active consumer per topic (partition).
> >
> > 2. Message processing in the order they were written to the
> >
> > topic (partition).
> >
> >
> > (1) is guaranteed by the broker by allowing only a single consumer per
> >
> > topic.
> >
> > (2) is guaranteed by the broker. Since we only have a single consumer, the
> >
> > only thing for the broker to take care of is delivery to messages precisely
> >
> > in the same order they received.
> >
> > Normal dispatching dispatches messages in the order written to the topic.
> >
> > When the consumer calls redeliverUnacknowledgedMessages(), it clears the
> >
> > incoming queue, and the broker rewinds the cursor to the mark delete
> >
> > position, disregarding any individual acks done after the mark delete. So
> >
> > messages are always delivered without any gaps.
> >
> >
> > Since the queue is empty, the next receive() call will block until the
> >
> > broker redelivers the messages and fills the consumer's internal queue.
> >
> >
> > The problem not raised in this discussion thread is the client
> >
> > implementation of negativeAcknowledgment().
> >
> > Negative Acknowledgment in today's implementation
> >
> >
> > Adds the negatively-acked message into the NegativeAckTracker, and sets a
> >
> > timer, if not already present, to send all pending acks in X seconds. Once
> >
> > that time is up, it sees that negative ack belongs on an Exclusive/Failover
> >
> > subscription type and hence translates that into
> >
> > redeliverUnacknowledgedMessages(). So in X seconds, it clears the queue and
> >
> > asks for messages to be redelivered. Since adding to NegativeAckTracker is
> >
> > an immediate action (add a message to the queue and return), it just
> >
> > returns. If you receive() 1,2,3, call nack(4) and then receive() and get
> >
> > 4,5,6,7,... After X seconds pass, your next receive suddenly gives you
> >
> > 4,5,6 again.
> >
> >
> > So in effect, what you would have expected here is that nack(4) in
> >
> > exclusive/shared will happen immediately - clear queue, write redeliver
> >
> > command to broker async and return immediately, hence next receive() will
> >
> > block until messages have been received.
> >
> >
> >
> > I do side with the suggestion to change the API for exclusive / shared to
> >
> > be more clear.
> >
> > In those types of subscriptions, it seems that the only actions you are
> >
> > supposed to do are:
> >
> >
> > 1. receive(): get the next message.
> >
> > 2. cumulativeAck(msg): acknowledge all messages up to msg have been
> >
> > successfully processed.
> >
> > 3. redeliverUnacknowledgedMessages() - clear the internal queue and ask the
> >
> > broker to resend messages from the last mark delete position.
> >
> >
> > There is one additional action in which you explicitly push the messages to
> >
> > a different topic or even the same topic, and that is:
> >
> > 4. reconsumeLater(msg): ack existing message and write it to the same topic
> >
> > or a different one. This is an explicit out-of-order consumption, but it
> >
> > can be clearly stated in docs.
> >
> >
> > I think we should have a different consumer interface holding those
> >
> > commands above.
> >
> >
> >
> >
> > On Thu, Nov 24, 2022 at 1:43 PM 丛搏  wrote:
> >
> >
> > > Hi, Joe:
> >
> > >
> >
> > >> This "brokenness" is not clear to me.
> >
> > > https://github.com/apache/pulsar/pull/10478 This PIP solves some
> >
> > > problems of "brokenness",
> >
> > >> The sequence 3,4,5,6,7,8,9,10,11 12,13,14,15, 16
> >
> > > ,9,10,11,12,13,14,15,16,17, 18, 19, 20 ...does not break
> >
> > >> the ordering guarantees of Pulsar
> >
> > > If don't use transaction ack, this order is fine. but when we use
> >
> > > transaction ack, in this case, message 9 and message 10 will be
> >
> > > handled twice. Therefore, we need redeliver and receive to be
> >
> > > synchronized to ensure that messages received before redeliver will
> >
> > > not be repeated and ordered, and will not be repeatedly consumed after
> >
> > > redeliver. To achieve these goals, we need to redeliver to be a
> >
> > > synchronous method instead of async and need to retry automatically.
> >
> > >
> >
> >
> >


Re: [VOTE] PIP-229: Add a common interface to get fields of MessageIdData

2022-12-22 Thread
+1 (non-binding)

Thanks,
Bo

Yunze Xu  于2022年12月22日周四 20:34写道:
>
> Hi all,
>
> I'm starting the VOTE for PIP-229: Add a common interface to get
> fields of MessageIdData: https://github.com/apache/pulsar/issues/18950
>
> Discussion thread:
> https://lists.apache.org/thread/25rzflmkfmvxhf3my0ombnbpv7bvgy32
>
> The vote will be open for at least 3 days.
>
> Thanks,
> Yunze


Re: [DISCUSS] PIP-224: Introduce TopicMessageId for consumer's MessageId related APIs

2022-12-22 Thread
Hi Yunze,

+1(non-binding)

Thank you very much for your patience. I have no other problem now.

Great PIP!

Thanks,
Bo

Yunze Xu  于2022年12月22日周四 15:29写道:
>
> Hi Penghui and Bo,
>
> I realized this problem as well. No new APIs should be added. I added
> the new API mainly to express that the semantics of seek will be
> modified while the acknowledge semantics keeps not changed. Now I
> removed the new APIs and updated this proposal.
>
> Thanks,
> Yunze
>
>
> On Thu, Dec 22, 2022 at 10:55 AM 丛搏  wrote:
> >
> > > Users can use
> > > ```
> > > MessageId.fromByteArrayWithTopic()
> > > ```
> > > or
> > > ```
> > > TopicMessageId.fromByteArray();
> > > TopicMessageId.create();
> > > ```
> >
> > I think this is a good point. `TopicName` don't in the `MessageId`
> > means that multiConsumer must know that this topic is a partitioned
> > topic and needs to assign `TopicName` to `MessageId` and doesn't need
> > to add new interfaces.
> >
> > Thanks,
> > Bo
> >
> > PengHui Li  于2022年12月22日周四 09:50写道:
> > >
> > > > Because `TopicMessageId` is constructed by the Pulsar Client library
> > > itself, which can guarantee `getOwnerTopic()` returns the correct
> > > topic name. The benefit of passing a `TopicMessageId` rather than the
> > > combination of a topic name and a `MessageId` is, users won't need to
> > > care about how to get the correct topic name for a given partition by
> > > themselves.
> > >
> > > This is a good point of why we should not introduce seek(TopicName,
> > > MessageId)
> > > It's better also to mention this part in the proposal. It will help us to
> > > understand
> > > why seek(TopicName, MessageId) is not a good way for Pulsar.
> > >
> > > Thanks,
> > > Penghui
> > >
> > > On Thu, Dec 22, 2022 at 9:48 AM PengHui Li  wrote:
> > >
> > > > > In short, `seek(msgId)` will call `seek(TopicMessageId)` if `msgId` is
> > > > a `TopicMessageId`.
> > > >
> > > > Does it look like we don't need to add the following new APIs?
> > > >
> > > > ```
> > > > void seek(TopicMessageId topicMessageId) throws PulsarClientException;
> > > > CompletableFuture seekAsync(TopicMessageId topicMessageId);
> > > > ```
> > > >
> > > > Users can use
> > > > ```
> > > > MessageId.fromByteArrayWithTopic()
> > > > ```
> > > > or
> > > > ```
> > > > TopicMessageId.fromByteArray();
> > > > TopicMessageId.create();
> > > > ```
> > > > to construct a TopicMessageId instance.
> > > >
> > > > But we can use the existing seek API with the constructed TopicMessageId
> > > > instance
> > > >
> > > > ```
> > > > consumer.seek(MessageId messageId);
> > > > ```
> > > >
> > > > Thanks,
> > > > Penghui
> > > >
> > > > On Thu, Dec 22, 2022 at 12:35 AM Yunze Xu 
> > > > wrote:
> > > >
> > > >> > but it needs to cast the `TopicMessageId` from `MessageId`, which is
> > > >> very user-unfriendly.
> > > >>
> > > >> Sorry I think my proposal doesn't express it well. In my original
> > > >> thought, no cast is needed, please see the update in
> > > >> https://github.com/apache/pulsar/issues/18616.
> > > >>
> > > >> In short, `seek(msgId)` will call `seek(TopicMessageId)` if `msgId` is
> > > >> a `TopicMessageId`.
> > > >>
> > > >> Thanks,
> > > >> Yunze
> > > >>
> > > >> On Wed, Dec 21, 2022 at 11:26 PM 丛搏  wrote:
> > > >> >
> > > >> > > If you mean `msg.getTopicName()`, how can you declare it's better 
> > > >> > > than
> > > >> > > `msgId.getOwnerTopic()`?
> > > >> >
> > > >> > > It can. Because the `TopicMessageIdImpl` already contains the 
> > > >> > > correct
> > > >> > > topic. That's the point.
> > > >> >
> > > >> > ```
> > > >> > var msgId = (TopicMessageId) multiTopicsConsumer.getMessageId();
> > > >> > ```
> > > >> >  if `msgId.getOwnerTopic()` is the interface of `MessageId`, I have 
> > > >> > no
> > &

Re: [DISCUSS] PIP-224: Introduce TopicMessageId for consumer's MessageId related APIs

2022-12-21 Thread
> Users can use
> ```
> MessageId.fromByteArrayWithTopic()
> ```
> or
> ```
> TopicMessageId.fromByteArray();
> TopicMessageId.create();
> ```

I think this is a good point. `TopicName` don't in the `MessageId`
means that multiConsumer must know that this topic is a partitioned
topic and needs to assign `TopicName` to `MessageId` and doesn't need
to add new interfaces.

Thanks,
Bo

PengHui Li  于2022年12月22日周四 09:50写道:
>
> > Because `TopicMessageId` is constructed by the Pulsar Client library
> itself, which can guarantee `getOwnerTopic()` returns the correct
> topic name. The benefit of passing a `TopicMessageId` rather than the
> combination of a topic name and a `MessageId` is, users won't need to
> care about how to get the correct topic name for a given partition by
> themselves.
>
> This is a good point of why we should not introduce seek(TopicName,
> MessageId)
> It's better also to mention this part in the proposal. It will help us to
> understand
> why seek(TopicName, MessageId) is not a good way for Pulsar.
>
> Thanks,
> Penghui
>
> On Thu, Dec 22, 2022 at 9:48 AM PengHui Li  wrote:
>
> > > In short, `seek(msgId)` will call `seek(TopicMessageId)` if `msgId` is
> > a `TopicMessageId`.
> >
> > Does it look like we don't need to add the following new APIs?
> >
> > ```
> > void seek(TopicMessageId topicMessageId) throws PulsarClientException;
> > CompletableFuture seekAsync(TopicMessageId topicMessageId);
> > ```
> >
> > Users can use
> > ```
> > MessageId.fromByteArrayWithTopic()
> > ```
> > or
> > ```
> > TopicMessageId.fromByteArray();
> > TopicMessageId.create();
> > ```
> > to construct a TopicMessageId instance.
> >
> > But we can use the existing seek API with the constructed TopicMessageId
> > instance
> >
> > ```
> > consumer.seek(MessageId messageId);
> > ```
> >
> > Thanks,
> > Penghui
> >
> > On Thu, Dec 22, 2022 at 12:35 AM Yunze Xu 
> > wrote:
> >
> >> > but it needs to cast the `TopicMessageId` from `MessageId`, which is
> >> very user-unfriendly.
> >>
> >> Sorry I think my proposal doesn't express it well. In my original
> >> thought, no cast is needed, please see the update in
> >> https://github.com/apache/pulsar/issues/18616.
> >>
> >> In short, `seek(msgId)` will call `seek(TopicMessageId)` if `msgId` is
> >> a `TopicMessageId`.
> >>
> >> Thanks,
> >> Yunze
> >>
> >> On Wed, Dec 21, 2022 at 11:26 PM 丛搏  wrote:
> >> >
> >> > > If you mean `msg.getTopicName()`, how can you declare it's better than
> >> > > `msgId.getOwnerTopic()`?
> >> >
> >> > > It can. Because the `TopicMessageIdImpl` already contains the correct
> >> > > topic. That's the point.
> >> >
> >> > ```
> >> > var msgId = (TopicMessageId) multiTopicsConsumer.getMessageId();
> >> > ```
> >> >  if `msgId.getOwnerTopic()` is the interface of `MessageId`, I have no
> >> > problem. but it needs to cast the `TopicMessageId` from `MessageId`,
> >> > which is very user-unfriendly. And it doesn't make sense.
> >> >
> >> > > I don't know what you're thinking about using
> >> > > `consumer.seek(msg.getTopicName(), msg.getMessageId()` for a single
> >> > > topic consumer. If it's accepted, and you want to unify the use case
> >> > > of `seek`, the original `seek` API should be deprecated and much
> >> > > existing code could be affected. If it's not accepted, users have to
> >> > > distinguish if a consumer is a multi-topics consumer.
> >> >
> >> > the same as `consumer.seek(TopicMessageId topicMessageId)` for a
> >> > single topic consumer is also strange. My point is either TopicName
> >> > belongs to MessageId or separate the two. it's not a good interface
> >> > implementation to couple them together. Very unclear.
> >> >
> >> > Thanks,
> >> > Bo
> >> >
> >> > Yunze Xu  于2022年12月21日周三 22:46写道:
> >> > >
> >> > > > If messageID does not contain `TopicName`, the `TopicName` is best
> >> get from msg.
> >> > >
> >> > > If you mean `msg.getTopicName()`, how can you declare it's better than
> >> > > `msgId.getOwnerTopic()`?
> >> > >
> >> > > > but it still cannot avoid `TopicName` for marking t

Re: [DISCUSS] PIP-224: Introduce TopicMessageId for consumer's MessageId related APIs

2022-12-21 Thread
> If you mean `msg.getTopicName()`, how can you declare it's better than
> `msgId.getOwnerTopic()`?

> It can. Because the `TopicMessageIdImpl` already contains the correct
> topic. That's the point.

```
var msgId = (TopicMessageId) multiTopicsConsumer.getMessageId();
```
 if `msgId.getOwnerTopic()` is the interface of `MessageId`, I have no
problem. but it needs to cast the `TopicMessageId` from `MessageId`,
which is very user-unfriendly. And it doesn't make sense.

> I don't know what you're thinking about using
> `consumer.seek(msg.getTopicName(), msg.getMessageId()` for a single
> topic consumer. If it's accepted, and you want to unify the use case
> of `seek`, the original `seek` API should be deprecated and much
> existing code could be affected. If it's not accepted, users have to
> distinguish if a consumer is a multi-topics consumer.

the same as `consumer.seek(TopicMessageId topicMessageId)` for a
single topic consumer is also strange. My point is either TopicName
belongs to MessageId or separate the two. it's not a good interface
implementation to couple them together. Very unclear.

Thanks,
Bo

Yunze Xu  于2022年12月21日周三 22:46写道:
>
> > If messageID does not contain `TopicName`, the `TopicName` is best get from 
> > msg.
>
> If you mean `msg.getTopicName()`, how can you declare it's better than
> `msgId.getOwnerTopic()`?
>
> > but it still cannot avoid `TopicName` for marking this `MessageID` belongs 
> > to this topic.
>
> It can. Because the `TopicMessageIdImpl` already contains the correct
> topic. That's the point.
>
> > if using `TopicMessageId` also has the same problem, why we need to use 
> > `TopicMessageId` not `MessageId`
>
> Because `TopicMessageId` is constructed by the Pulsar Client library
> itself, which can guarantee `getOwnerTopic()` returns the correct
> topic name. The benefit of passing a `TopicMessageId` rather than the
> combination of a topic name and a `MessageId` is, users won't need to
> care about how to get the correct topic name for a given partition by
> themselves.
>
> The key point is that if there is only one valid value for an
> argument, which relies on the other argument, then the API design is
> bad. Assume you need to use the Pulsar client like:
>
> ```
> // numberOfMessages must be the same with msgIds.size(), otherwise, an
> exception will be thrown
> consumer.acknowledge(numberOfMessages, msgIds);
> ```
>
> With the API of this proposal, users don't need to care much about how
> to call `seek` correctly, except the MessageId is returned by
> Producer#send. `consumer.seek(msg.getMessageId())` works for all
> cases.
>
> With the `seek(String, MessageId)` API, you have to write more
> explanations like:
> 1. If the consumer only subscribes to a topic, use
> `consumer.seek(msg.getMessageId())`.
> 2. If the consumer subscribes to multiple topics, use
> `consumer.seek(topic, msg.getMessageId())`. The topic must be what the
> message belongs to, so you have to use the correct topic like
> `consumer.seek(msg.getTopicName(), msg.getMessageId()`. Otherwise,
> seek would fail.
>
> I don't know what you're thinking about using
> `consumer.seek(msg.getTopicName(), msg.getMessageId()` for a single
> topic consumer. If it's accepted, and you want to unify the use case
> of `seek`, the original `seek` API should be deprecated and much
> existing code could be affected. If it's not accepted, users have to
> distinguish if a consumer is a multi-topics consumer.
>
> Thanks,
> Yunze
>
> On Wed, Dec 21, 2022 at 8:50 PM 丛搏  wrote:
> >
> > Hi, Yunze:
> >
> > < ```java
> > < var msg = multiTopicsConsumer.receive();
> > < var msgId = (TopicMessageId) multiTopicsConsumer.getMessageId();
> > < consumer.seek(msgId.getOwnerTopic(), msgId);
> > < ```
> >
> > the code can be like this:
> >  ```java
> > var msg = anyConsumer.receive();
> > var msgId = anyConsume.getMessageId();
> > consumer.seek(msg, msgId);
> >  ```
> > If messageID does not contain `TopicName`, the `TopicName` is best get from 
> > msg.
> >
> > < What's different is that the offset in Kafka can represent a position
> > < of ANY partition, while the MessageId in Pulsar can only represent the
> > < position of A SPECIFIC partition.
> >
> > Although MessageId in Pulsar can only represent the position of A
> > SPECIFIC partition, but it still needs a TopicName. `LedgerID` and
> > `EntryID` do not mean that this `MessageID` belongs to a topic
> > (although it does belong), but it still cannot avoid `TopicName` for
> > marking this `MessageID` belongs to this topic.
> >
> > > And in Pulsar, we also

Re: [VOTE] Pulsar Client Python Release 3.0.0 Candidate 3

2022-12-21 Thread
+1 (non-binding)

python version: 3.7.6
- Checked the signature
- Install the python .whl file (pip3 install
pulsar_client-3.0.0-cp37-cp37m-macosx_10_15_universal2.whl) on macOS
12.3.1
- Start the standalone (2.10.2)
- Start consumer (python3 ./examples/consumer.py)
- Start producer (python3 ./examples/producer.py )

Thanks,
Bo

PengHui Li  于2022年12月21日周三 18:01写道:
>
> +1
>
> - Checked the signature
> - Install the python .whl file (pip3 install
> pulsar_client-3.0.0-cp38-cp38-macosx_10_15_universal2.whl) on macOS 13.0.1
> (22A400)
> - Start the standalone (build from the master branch)
> - Start consumer (python3 ./examples/consumer.py)
> - Start producer (python3 ./examples/producer.py)
>
> --
> Penghui
>
> On Sat, Dec 17, 2022 at 10:35 AM Matteo Merli 
> wrote:
>
> > +1
> >
> >  * From candidate-2 I checked the source tarball for licenses, notice
> > and no binaries included.
> > --
> > Matteo Merli
> > 
> >
> > On Thu, Dec 15, 2022 at 11:30 PM Yunze Xu 
> > wrote:
> > >
> > > This is the third release candidate for Apache Pulsar Client Python,
> > > version 3.0.0.
> > >
> > > It fixes the following issues:
> > > https://github.com/apache/pulsar-client-python/milestone/1?closed=1
> > >
> > > *** Please download, test and vote on this release. This vote will
> > > stay open for at least 72 hours ***
> > >
> > > Python wheels:
> > >
> > https://dist.apache.org/repos/dist/dev/pulsar/pulsar-client-python-3.0.0-candidate-3/
> > >
> > > The supported python versions are 3.7, 3.8, 3.9 and 3.10. The
> > > supported platforms and architectures are:
> > > - Windows x86_64 (windows/)
> > > - glibc-based Linux x86_64 (linux-glibc-x86_64/)
> > > - glibc-based Linux arm64 (linux-glibc-arm64/)
> > > - musl-based Linux x86_64 (linux-musl-x86_64/)
> > > - musl-based Linux arm64 (linux-musl-arm64/)
> > > - macOS universal 2 (macos/)
> > >
> > > The tag to be voted upon: v3.0.0-candidate-3
> > > (46acc487ad16fdc0aeea9dae64484030e62c1b96)
> > >
> > https://github.com/apache/pulsar-client-python/releases/tag/v3.0.0-candidate-3
> > >
> > > Pulsar's KEYS file containing PGP keys you use to sign the release:
> > > https://dist.apache.org/repos/dist/dev/pulsar/KEYS
> > >
> > > Please download the Python wheels and follow the README to test.
> > >
> > > Thanks,
> > > Yunze
> >


Re: [DISCUSS] PIP-224: Introduce TopicMessageId for consumer's MessageId related APIs

2022-12-21 Thread
Hi, Yunze:

< ```java
< var msg = multiTopicsConsumer.receive();
< var msgId = (TopicMessageId) multiTopicsConsumer.getMessageId();
< consumer.seek(msgId.getOwnerTopic(), msgId);
< ```

the code can be like this:
 ```java
var msg = anyConsumer.receive();
var msgId = anyConsume.getMessageId();
consumer.seek(msg, msgId);
 ```
If messageID does not contain `TopicName`, the `TopicName` is best get from msg.

< What's different is that the offset in Kafka can represent a position
< of ANY partition, while the MessageId in Pulsar can only represent the
< position of A SPECIFIC partition.

Although MessageId in Pulsar can only represent the position of A
SPECIFIC partition, but it still needs a TopicName. `LedgerID` and
`EntryID` do not mean that this `MessageID` belongs to a topic
(although it does belong), but it still cannot avoid `TopicName` for
marking this `MessageID` belongs to this topic.

> And in Pulsar, we also do not expose
> the partition concept, if we introduce the seek API with the topic
> name as the argument, we have to explain in detail about what's the
> topic name for a partition. It could be a very confusing thing from my
> experience when I explained the "partition" concept in community.

if using `TopicMessageId` also has the same problem, why we need to
use `TopicMessageId` not `MessageId`

Thanks,
Bo

Yunze Xu  于2022年12月21日周三 16:59写道:
>
> Hi Bo,
>
> If we have the `seek` API that accepts a topic name, how to use seek
> for a single topic consumer and multi-topics consumer will be
> different.
>
> ```java
> var msg = singleTopicConsumer.receive();
> var msgId = singleTopicConsumer.getMessageId();
> consumer.seek(msgId);
> ```
>
> ```java
> var msg = multiTopicsConsumer.receive();
> var msgId = (TopicMessageId) multiTopicsConsumer.getMessageId();
> consumer.seek(msgId.getOwnerTopic(), msgId);
> ```
>
> It's not as clear as you have thought. A question could come from the
> code above: since we can get the key (topic name) from `msgId` itself,
> why do we need another argument?
>
> What's worse is that users have to specify the correct topic name. For
> a partitioned topic, if users specified another partition, the `seek`
> operation would fail. If they specified something like
> `multiTopicsConsumer.getTopic()`, it would also fail because other
> APIs like `Consumer#getTopic()` doesn't return the correct topic name.
>
> If there is only one correct topic name for a given TopicMessageId,
> what's the meaning of making it as a required argument?
>
> BTW, let's see Kafka client's commit API:
>
> ```java
> public void commitSync(Map offsets)
> ```
>
> What's different is that the offset in Kafka can represent a position
> of ANY partition, while the MessageId in Pulsar can only represent the
> position of A SPECIFIC partition. And in Pulsar, we also do not expose
> the partition concept, if we introduce the seek API with the topic
> name as the argument, we have to explain in detail about what's the
> topic name for a partition. It could be a very confusing thing from my
> experience when I explained the "partition" concept in community.
>
> Thanks,
> Yunze
>
>
> On Wed, Dec 21, 2022 at 3:20 PM 丛搏  wrote:
> >
> > Hi Yunze,
> >
> > add `TopicMessageId ` will couple messageID and `topic name` together,
> > which is very unclear for non-partition-topic.
> >
> > ```
> > void seek(String topicName, MessageId messageId) throws 
> > PulsarClientException;
> > List> getLastTopicMessageId() throws
> > PulsarClientException;
> > ```
> > If the interface is designed in this way, it may be simpler, easier to
> > understand, and more intuitive for users, and MessageID will not be
> > coupled with TopicName.
> >
> > because this PIP has already initiated a VOTE, so I will sync this
> > reply to PIP-224-VOTE[0]
> >
> > Thanks,
> > Bo
> > [0] https://lists.apache.org/thread/mbrpjsgrgwrlkdpvkk738jxnlk7rf4qk
> >
> > Yunze Xu  于2022年12月9日周五 14:33写道:
> > >
> > > Hi Jiaqi,
> > >
> > > Let's move to 
> > > https://lists.apache.org/thread/mbrpjsgrgwrlkdpvkk738jxnlk7rf4qk
> > > for the vote.
> > >
> > > Thanks,
> > > Yunze
> > >
> > > On Fri, Dec 9, 2022 at 1:54 PM Jiaqi Shen  wrote:
> > > >
> > > > This is make sense to me, +1
> > > >
> > > > Thanks,
> > > > Jiaqi Shen
> > > >
> > > >
> > > > Yunze Xu  于2022年12月7日周三 13:51写道:
> > > >
> > > > > Hi Baodi,
> > > > >
> > > > > I decided not to change the behavi

Re: [VOTE] PIP-224: Introduce TopicMessageId for consumer's MessageId related APIs

2022-12-20 Thread
Hi Yunze,

add `TopicMessageId ` will couple messageID and `topic name` together,
which is very unclear for non-partition-topic.

```
void seek(String topicName, MessageId messageId) throws PulsarClientException;
List> getLastTopicMessageId() throws
PulsarClientException;
```
If the interface is designed in this way, it may be simpler, easier to
understand, and more intuitive for users, and MessageID will not be
coupled with TopicName.

Thanks,
Bo

Yunze Xu  于2022年12月16日周五 15:31写道:
>
> Yeah, it's an implementation detail and I will keep the same semantics
> with the latest master when I push my PR.
>
> Thanks,
> Yunze
>
> On Fri, Dec 16, 2022 at 3:03 PM 丛搏  wrote:
> >
> > if you don't change this in PIP-229 or PIP-224, I will create a new
> > PIP to handle the `BatchMessageIdImpl` and `MessageIdImpl`
> > `compareTo()` method, now I have no problem with this PIP
> > +1 (non-binding)
> > Sorry to bother this PIP vote.
> >
> > Thanks,
> > Bo
> >
> > Yunze Xu  于2022年12月16日周五 11:58写道:
> > >
> > > If this breaking change can pass the PMC votes, I will keep the new
> > > semantics in PIP-229. Otherwise, it would not make sense to adopt the
> > > new semantics in PIP-229.
> > >
> > > Thanks,
> > > Yunze
> > >
> > > On Fri, Dec 16, 2022 at 11:46 AM Yunze Xu  wrote:
> > > >
> > > > I cannot find any confusing code from the proposal itself. Could you
> > > > point it out? If you are mentioning the `legacyCompare` and `compare`
> > > > methods in #18890 [1], it's not related to this proposal. And I have
> > > > opened PIP-229 [2] for discussion.
> > > >
> > > > BTW, the PIP-229 itself doesn't mention the compare logic. But I'm not
> > > > going to adopt the new semantics because it's actually a breaking
> > > > change, just as I've replied. You might think it's a bug, but it's a
> > > > public API. Any change of the semantics in the public API is a
> > > > breaking change.
> > > >
> > > > [1] https://github.com/apache/pulsar/pull/18890/files
> > > > [2] https://lists.apache.org/thread/x52zpwlo8pxzp81oxllh5vw82kyrzgpk
> > > >
> > > > On Fri, Dec 16, 2022 at 11:34 AM 丛搏  wrote:
> > > > >
> > > > > Although unrelated, it adds a lot of confusing code.
> > > > >
> > > > > Thanks,
> > > > > Bo
> > > > >
> > > > > Yunze Xu  于2022年12月16日周五 08:05写道:
> > > > >
> > > > > >
> > > > > > This proposal is not related to the comparison logic between
> > > > > > BatchMessageIdImpl and MessageIdImpl.
> > > > > >
> > > > > > Thanks,
> > > > > > Yunze
> > > > > >
> > > > > > On Thu, Dec 15, 2022 at 12:58 PM 丛搏  wrote:
> > > > > > >
> > > > > > > -1 (non-binding)
> > > > > > > sorry, I have one question about the BatchMessageId compareTo()
> > > > > > > method. the discussion mail :
> > > > > > > https://lists.apache.org/thread/8n3oyk2hdsskkotnj4lnlvfnndctpqbg.
> > > > > > > I hope it can be this issue can be discussed clearly.
> > > > > > >
> > > > > > > I hope it can be this issue can be discussed clearly. I will 
> > > > > > > retry to
> > > > > > > vote until this issue clearly :
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Bo
> > > > > > >
> > > > > > > 丛搏  于2022年12月14日周三 22:56写道:
> > > > > > > >
> > > > > > > > +1 (non-binding)
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Bo
> > > > > > > >
> > > > > > > > PengHui Li  于2022年12月14日周三 19:12写道:
> > > > > > > > >
> > > > > > > > > +1 (binding)
> > > > > > > > >
> > > > > > > > > - Penghui
> > > > > > > > >
> > > > > > > > > On Sun, Dec 11, 2022 at 6:36 AM Enrico Olivelli 
> > > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > > +1 (binding)
> > > > > > > &g

Re: [DISCUSS] PIP-224: Introduce TopicMessageId for consumer's MessageId related APIs

2022-12-20 Thread
Hi Yunze,

add `TopicMessageId ` will couple messageID and `topic name` together,
which is very unclear for non-partition-topic.

```
void seek(String topicName, MessageId messageId) throws PulsarClientException;
List> getLastTopicMessageId() throws
PulsarClientException;
```
If the interface is designed in this way, it may be simpler, easier to
understand, and more intuitive for users, and MessageID will not be
coupled with TopicName.

because this PIP has already initiated a VOTE, so I will sync this
reply to PIP-224-VOTE[0]

Thanks,
Bo
[0] https://lists.apache.org/thread/mbrpjsgrgwrlkdpvkk738jxnlk7rf4qk

Yunze Xu  于2022年12月9日周五 14:33写道:
>
> Hi Jiaqi,
>
> Let's move to https://lists.apache.org/thread/mbrpjsgrgwrlkdpvkk738jxnlk7rf4qk
> for the vote.
>
> Thanks,
> Yunze
>
> On Fri, Dec 9, 2022 at 1:54 PM Jiaqi Shen  wrote:
> >
> > This is make sense to me, +1
> >
> > Thanks,
> > Jiaqi Shen
> >
> >
> > Yunze Xu  于2022年12月7日周三 13:51写道:
> >
> > > Hi Baodi,
> > >
> > > I decided not to change the behavior of the `negativeAcknowledge`
> > > method. I just checked again that there is no exception signature for
> > > this method and there is no asynchronous version like
> > > `negativeAcknowledgeAsync`. To keep the API compatible, we should not
> > > add an exception signature, which would be required if a
> > > `PulsarClientException` was thrown.
> > >
> > > Thanks,
> > > Yunze
> > >
> > > On Tue, Nov 29, 2022 at 10:12 PM Baodi Shi 
> > > wrote:
> > > >
> > > > Hi, Yunze:
> > > >
> > > > Thanks for your proposal. That Looks good to me.
> > > >
> > > > `negativeAcknowledge` also needs to add the same checks as the new
> > > acknowledge interface.
> > > >
> > > > > This interface doesn't add any acknowledge overload because the
> > > overloads are already too many. But it will make the behavior clear.
> > > > I think since we exposed the TopicMessageId, it would be better to add
> > > overloaded interfaces (even if the overloads are a lot). This can users to
> > > clearly associate the use cases of MultiTopicConsumer and TopicMessageId.
> > > >
> > > > Also, while it's okay to use TopicMessageId param on a single consumer,
> > > I guess we shouldn't allow users to use it.
> > > >
> > > > In this way, users are clearly aware that TopicMessageId is used when
> > > using MultiTopicConsumer and MessageId is used when using
> > > SingleTopicConsumer.(Maybe it's not a good idea)
> > > >
> > > >
> > > > Thanks,
> > > > Baodi Shi
> > > >
> > > > > 2022年11月29日 15:57,Yunze Xu  写道:
> > > > >
> > > > >> Is there a case where the user uses the messageId returned by the
> > > > > producer to seek in the consumer? Is this a good behavior?
> > > > >
> > > > > Yes. I think it should be acceptable. To correct my previous point,
> > > > > now I think the MessageId returned by send should also be able to be
> > > > > applied for seek or acknowledge.
> > > > >
> > > > >> even with the
> > > > > current proposal, it may return null when getting the topic from
> > > > > TopicMessageId for backward compatibility.
> > > > >
> > > > > No. It may return null just because Java doesn't allow a non-null
> > > > > returned value. The internal implementations of
> > > > > TopicMessageId#getOwerTopic should return a non-null topic name to
> > > > > avoid null check.
> > > > >
> > > > > When I mentioned **the implementation of getTopicName() must return
> > > > > null**, the assumption is that MessageId#toByteArray serializes the
> > > > > topic name if adding the `getTopicName()` method. However, in this
> > > > > proposal, `TopicMessageId#toByteArray` won't. See the implementation
> > > > > of `TopicMessageId#create`. It's only a wrapper for an arbitrary
> > > > > MessageId implementation.
> > > > >
> > > > > Thanks,
> > > > > Yunze
> > > > >
> > > > > On Tue, Nov 29, 2022 at 2:47 PM Zike Yang  wrote:
> > > > >>
> > > > >> Hi Yunze,
> > > > >>
> > > > >> Thanks for your proposal. Quoted from your GitHub comments[0]:
> > > > >>
> > > > >>> There is also a case when MessageId is returned from Producer#send.
> > > In this case, the returned MessageId should only used for serialization
> > > > >>
> > > > >> Is there a case where the user uses the messageId returned by the
> > > > >> producer to seek in the consumer? Is this a good behavior?
> > > > >>
> > > > >>> If we added the method directly to MessageId, to keep the backward
> > > compatibility, the implementation of getTopicName() must return null, 
> > > which
> > > is not a good design.
> > > > >>
> > > > >> I think it's a trade-off. If I understand correctly, even with the
> > > > >> current proposal, it may return null when getting the topic from
> > > > >> TopicMessageId for backward compatibility. The current
> > > > >> TopicMessageIdImpl doesn't serialize the topic information.
> > > > >>
> > > > >>
> > > > >> [0]
> > > https://github.com/apache/pulsar/issues/18616#issuecomment-1328609346
> > > > >>
> > > > >> Thanks,
> > > > >> Zike Yang
> > > > >>
> > > > >> On Mon, Nov 28, 2022 at 

Re: [DISCUSS] PIP-229: Add a common interface to get fields of MessageIdData

2022-12-20 Thread
Hi, Yunze

> This interface is introduced mainly because we
> don't have a String parameter in seek and acknowledge before. It's a
> tradeoff between compatibility and  complexity.

I think `MessageIdAdv` and adding `topic name` is more confusing for
non-partition-topic and redundant. If we don't couple TopicName and
MessageID together, It is better to change our interface design into
two parameters (topic name, MessageId). This is clearer and more
intuitive, and it is easier for users to understand MessageId.
MessageId also becomes simpler. I will sync this point to
PIP-224-DISCUSS [0].

I don't have any problem with this pip now, thanks for your reply.
+1 (non-binding)

Thanks,
Bo
[0] https://lists.apache.org/thread/jhqy65cdyxzmmxnfsjm8rv9pbk76noxy

Yunze Xu  于2022年12月20日周二 15:27写道:
>
> Hi Bo,
>
> > But when `topicName` is not in the `PulsarApiMessageId`, they should use 
> > `messageId instanceof TopicMessageId` to get the `topicName`
>
> The scenario only happens for `acknowledge` APIs because I don't want
> to add more overloads in PIP-224. The new `seek` overloads in PIP-224
> just accept the `TopicMessageId` parameter.
>
> Combining the PIP-224 and PIP-229, the code will look like:
>
> ```java
> // MultiTopicsConsumerImpl
> void acknowledge(MessageId msgId) {
> if (!(msgId instanceof TopicMessageId)) {
> throw new PulsarClientException.NotAllowedException("");
> }
> consumers.get(((TopicMessageId) 
> msgId).getOwnerTopic()).acknowledge(msgId);
> }
> void seek(TopicMessageId topicMessageId) {
> consumers.get(topicMessageId.getOwnerTopic()).seek(topicMessageId);
> }
> ```
>
> ```java
> // ConsumerImpl
> void acknowledge(MessageId msgId) {
> internalAcknowledge((MessageIdAdv) msgId); // we never need the
> owner topic now
> }
> void seek(TopicMessageId topicMessageId) {
> internalSeek((MessageIdAdv) topicMessageId);
> }
> ```
>
> That's why we need two interfaces. The use cases of TopicMessageId are
> very rare and limited. This interface is introduced mainly because we
> don't have a String parameter in seek and acknowledge before. It's a
> tradeoff between compatibility and  complexity. But the use cases of
> MessageIdAdv are very common.
>
> Thanks,
> Yunze
>
> On Mon, Dec 19, 2022 at 9:00 PM 丛搏  wrote:
> >
> > Hi Yunze :
> >
> > I have no reason to must add the `getTopicName()` into the
> > `PulsarApiMessageId` (or named `MessageIdAdv `).
> > I just thought of a scenario that users would use. For the consumer
> > side, they don't know whether the topic is a partitioned topic or a
> > non-partitioned topic. Maybe they don't care about it. But when
> > `topicName` is not in the `PulsarApiMessageId`, they should use
> > `messageId instanceof TopicMessageId` to get the `topicName`, if this
> > `topicName` is in the `PulsarApiMessageId`, they only use the unified
> > logic to get the field from the messageId `PulsarApiMessageId
> > pulsarApiMessageId = (PulsarApiMessageId) messageId`
> >
> > These are just some of my views, and it will not block this PIP.
> >
> > Thanks,
> > Bo
> >
> > Yunze Xu  于2022年12月19日周一 10:41写道:
> > >
> > > Hi Bo,
> > >
> > > For a single-topic consumer, the MessageId implementations returned by
> > > receive() should be a MessageIdImpl or BatchedMessageIdImpl. Is there
> > > any reason to add a `getTopicName()` method for them (via extending
> > > TopicMessageId)? If yes, we have to use the default implementation of
> > > `getTopicName()` that returns null without modifying any existing
> > > code. If you don't want to return a null value, you have to add new
> > > logic that sets the topic explicitly when creating the received
> > > message.
> > >
> > > Thanks,
> > > Yunze
> > >
> > > On Sun, Dec 18, 2022 at 3:44 PM 丛搏  wrote:
> > > >
> > > > < For a single-topic consumer, wrapping the topic name is
> > > > < redundant and might break the existing behavior. In this case, if
> > > > < `PulsarApiMessageId` extends `TopicMessageId`, the `getTopicName()`
> > > > < method should return null, which is not a good design [1][2].
> > > >
> > > > For `TopicMessageIdImpl`, it is an original method. for
> > > > `PulsarApiMessageId` if extend `TopicMessageId` it is a new method for
> > > > any `MessageId` extend `PulsarApiMessageId`, why do we have to return
> > > > null? I think it just reduces the transmission of useless fields at
> > > > the network layer and not added to MessageIdDat

Re: [DISCUSS] Introduce oshi library to sensory OS resources

2022-12-20 Thread
+1 (non-binding)

Thanks,
Bo

PengHui Li  于2022年12月21日周三 11:48写道:
>
> +1
>
> Thanks,
> Penghui
>
> On Mon, Dec 19, 2022 at 6:19 PM  wrote:
>
> >
> > Hi, All
> >
> > I would like to introduce a new library oshi[1] to help Apache Pulsar
> > sensory OS resources. It can help us to get away from the complex file
> > manipulation and cross-platform compatibility issues in some operating
> > systems.
> >
> > code example:   https://github.com/apache/pulsar/pull/18984
> >
> > Please feel free to left comments.
> >
> >
> > Best,
> > Mattison
> >
> >
> > [1] https://github.com/oshi/oshi
> >


Re: [DISCUSS] Change the default IO threads and listener threads of Java Client

2022-12-20 Thread
+1
Our default configuration is best for most users. Multiple clients are
a few cases.

Thanks,
Bo

houxiaoyu  于2022年12月20日周二 16:02写道:
>
> +1
>
> This change might bring thread number increment in case users create many
> clients, but too many pulsar clients run in one machine is not a good use
> case I think,  so this change looks good to me.
>
> Thanks,
> Xiaoyu Hou
>
>  于2022年12月20日周二 12:25写道:
>
> > +1
> >
> > My concern is whether this change will affect some users who are creating
> > many clients. I think we can wait for other users to confirm it. (If this
> > will be affected, maybe we can give it a max_io_thread_num and then expand
> > the size from 1 to max_io_thread_num when adding a new consumer or producer)
> >
> >
> > Best,
> > Mattison
> > On Dec 20, 2022, 11:17 +0800, PengHui Li , wrote:
> > > Hi all,
> > >
> > > I noticed the Java Client (I haven't checked other clients) uses 1 IO
> > > thread and 1 listener
> > > thread by default. It will require users to update the thread
> > configuration
> > > if they have
> > > multiple cores and desired high throughput.
> > >
> > > Here is the example that we change to 16 IO threads in
> > > openmessaging benchmark
> > >
> > https://github.com/openmessaging/benchmark/blob/master/driver-pulsar/pulsar.yaml#L22
> > >
> > > We can apply the configuration of the threads based on the CPU cores. So
> > > that for the
> > > most common cases, users don't need to touch the thread configuration.
> > >
> > > ```
> > > private int numIoThreads = Runtime.getRuntime().availableProcessors();
> > > private int numListenerThreads =
> > Runtime.getRuntime().availableProcessors();
> > > ```
> > >
> > > WDYT?
> > >
> > > Thanks,
> > > Penghui
> >


Re: [VOTE] PIP-228: Refactor the information architecture of Pulsar client docs

2022-12-20 Thread
+1 (non-binding)

Thanks,
Bo

Yu  于2022年12月21日周三 09:51写道:
>
> +1
>
> On Tue, Dec 20, 2022 at 3:12 PM Yunze Xu 
> wrote:
>
> > +1 (non-binding)
> >
> > Thanks,
> > Yunze
> >
> > On Tue, Dec 20, 2022 at 3:06 PM Zike Yang  wrote:
> > >
> > > +1 (non-binding)
> > >
> > > Thanks,
> > > Zike Yang
> > >
> > > On Tue, Dec 13, 2022 at 4:38 PM Jun Ma  wrote:
> > > >
> > > > Hi all,
> > > >
> > > > I'm going to start the vote for PIP-228 [Refactor the information
> > architecture of Pulsar client docs](
> > https://github.com/apache/pulsar/issues/18822).
> > > >
> > > > And this is the original thread for discussion:
> > https://lists.apache.org/thread/bv6lwnt708dxst173knyzv2bfy4d1ox4.
> > > >
> > > > The vote will be open for at least three days.
> > > >
> > > >
> > > > Thank you.
> > > > Jun
> >


Re: [DISCUSS] Proposal required for Admin API/CLI and metrics changes

2022-12-19 Thread
Hi Penghui:

< All of the discussions here are about the Admin API / CLI changes
< he metrics changes are about adding new metrics added, adding new labels,
< removing labels, etc. It's not about the implementations. A bug fix usually
< happens to the implementation, not the API definition or metrics definition.
sorry, I didn't express clearly.

>  ### Does this pull request potentially affect one of the following
>  parts:
>
> *If the box was checked, please highlight the changes*
>
>  - [ ] Dependencies (add or upgrade a dependency)
>  - [ ] The public API
>  - [ ] The schema
>  - [ ] The default values of configurations
>  - [ ] The binary protocol
> - [ ] The REST endpoints
>  - [ ] The admin CLI options
>  - [ ] Anything that affects deployment
>  ```
>
>  And the CI can try to add labels `doc required` or `wants/proposal`
> according to the
> list selections.
>  And we can add `The metrics` item to the list.
maybe we should add a box named `metrics(new metrics added, adding new
labels, removing labels etc.)`, Otherwise, [x]metrics will be selected
for the metrics bug fix like the schema option

> I think we can just update the contribution guide
> https://pulsar.apache.org/contributing/ to make it more clear?
> The contribution guide can be used by contributors / reviewers.
> And not only a committer can be a reviewer.
yes, I think we are.

Thanks,
Bo



PengHui Li  于2022年12月19日周一 16:22写道:
>
> Hi Bo,
>
> > yes, need a proposal for Admin API/CLI and metrics changes. but It is
> > difficult for us to judge whether a PR is a bug fix or needs PIP or
> > doc/required through automatic detection. We can standardize the
> > committer review process and increase the description of the review
> > document, which may achieve better results.
>
> All of the discussions here are about the Admin API / CLI changes
> The metrics changes are about adding new metrics added, adding new labels,
> removing labels, etc. It's not about the implementations. A bug fix usually
> happens to the implementation, not the API definition or metrics definition.
>
> I think we can just update the contribution guide
> https://pulsar.apache.org/contributing/ to make it more clear?
> The contribution guide can be used by contributors / reviewers.
> And not only a committer can be a reviewer.
>
> For the review process and review document. It can be a separate
> thread if you have any good ideas you want to share.
>
> Thanks,
> Penghui
>
>
> On Sat, Dec 17, 2022 at 9:16 PM 丛搏  wrote:
>
> > > Is it time to require a proposal for Admin API/CLI and metrics changes?
> >
> > yes, need a proposal for Admin API/CLI and metrics changes. but It is
> > difficult for us to judge whether a PR is a bug fix or needs PIP or
> > doc/required through automatic detection. We can standardize the
> > committer review process and increase the description of the review
> > document, which may achieve better results.
> >
> > Thanks,
> > Bo
> >
> > PengHui Li  于2022年12月17日周六 09:00写道:
> > >
> > > I have pushed out a PR to update the PR template and PIP for metrics
> > changes
> > >
> > > https://github.com/apache/pulsar/pull/18961
> > >
> > > PTAL.
> > >
> > > Thanks,
> > > Penghui
> > >
> > > On Wed, Dec 7, 2022 at 4:17 PM Haiting Jiang 
> > wrote:
> > >
> > > > +1 for enforcing the PIP procedures.
> > > >
> > > > > And the CI can try to add labels `doc required` or `wants/proposal`
> > > > > according to the list selections.
> > > >
> > > > Is it possible that the CI can check if there is a "voted" PIP linking
> > > > to this PR.
> > > > And the label can be manually added by committers if the PR author
> > > > missed checking the boxes.
> > > >
> > > > Thanks,
> > > > Haiting
> > > >
> > > > On Wed, Dec 7, 2022 at 4:07 PM PengHui Li  wrote:
> > > > >
> > > > > > I agree a proposal would be better before adding a PR. But the
> > > > > document part must be a part of such a proposal.
> > > > >
> > > > > Make sense. It looks like we should have a checklist for the
> > proposal.
> > > > > The documentation changes should be listed in the proposal.
> > > > >
> > > > > > Can the PR template/GitHub process check that if either the api
> > changes
> > > > > and doc-required are checked both are checked with textual
> > information
> > > > > 

Re: [DISCUSS] PIP-229: Add a common interface to get fields of MessageIdData

2022-12-19 Thread
Hi Yunze :

I have no reason to must add the `getTopicName()` into the
`PulsarApiMessageId` (or named `MessageIdAdv `).
I just thought of a scenario that users would use. For the consumer
side, they don't know whether the topic is a partitioned topic or a
non-partitioned topic. Maybe they don't care about it. But when
`topicName` is not in the `PulsarApiMessageId`, they should use
`messageId instanceof TopicMessageId` to get the `topicName`, if this
`topicName` is in the `PulsarApiMessageId`, they only use the unified
logic to get the field from the messageId `PulsarApiMessageId
pulsarApiMessageId = (PulsarApiMessageId) messageId`

These are just some of my views, and it will not block this PIP.

Thanks,
Bo

Yunze Xu  于2022年12月19日周一 10:41写道:
>
> Hi Bo,
>
> For a single-topic consumer, the MessageId implementations returned by
> receive() should be a MessageIdImpl or BatchedMessageIdImpl. Is there
> any reason to add a `getTopicName()` method for them (via extending
> TopicMessageId)? If yes, we have to use the default implementation of
> `getTopicName()` that returns null without modifying any existing
> code. If you don't want to return a null value, you have to add new
> logic that sets the topic explicitly when creating the received
> message.
>
> Thanks,
> Yunze
>
> On Sun, Dec 18, 2022 at 3:44 PM 丛搏  wrote:
> >
> > < For a single-topic consumer, wrapping the topic name is
> > < redundant and might break the existing behavior. In this case, if
> > < `PulsarApiMessageId` extends `TopicMessageId`, the `getTopicName()`
> > < method should return null, which is not a good design [1][2].
> >
> > For `TopicMessageIdImpl`, it is an original method. for
> > `PulsarApiMessageId` if extend `TopicMessageId` it is a new method for
> > any `MessageId` extend `PulsarApiMessageId`, why do we have to return
> > null? I think it just reduces the transmission of useless fields at
> > the network layer and not added to MessageIdData. LedgerId and EntryId
> > are in PulsarApiMessageId, why shouldn't `topicName` be added in?
> >
> > Thanks,
> > Bo
> >
> > Yunze Xu  于2022年12月18日周日 14:23写道:
> > >
> > > Hi Bo,
> > >
> > > Because the topic name is not a part of MessageIdData. It's only used
> > > to find the correct internal consumer of a multi-topics consumer.
> > >
> > > > All I can think of is PulsarApiMessageId extend 
> > > > TopicMessageId(PIP-224[1]) right?
> > >
> > > No. The `TopicMessageId` could only be used in a multi-topics
> > > consumer. For a single-topic consumer, wrapping the topic name is
> > > redundant and might break the existing behavior. In this case, if
> > > `PulsarApiMessageId` extends `TopicMessageId`, the `getTopicName()`
> > > method should return null, which is not a good design [1][2].
> > >
> > > After both PIP-224 and PIP-229 are approved, the `TopicMessageIdImpl`
> > > will implement both `PulsarApiMessageId` and `TopicMessageId`
> > > interfaces. Other `MessageId` implementations only need to implement
> > > `PulsarApiMessageId`.
> > >
> > > BTW, PIP-224 mainly solves two problems:
> > > 1. When a multi-topics consumer acknowledges a `MessageId` that is not
> > > a `TopicMessageId`, a `PulsarClientException.NotAllowedException` will
> > > be thrown in synchronous methods. The asynchronous methods should not
> > > throw an exception.
> > > 2. For a multi-topics consumer, support seeking with a `TopicMessageId`.
> > >
> > > PIP-224 is designed for application users to specify an associated
> > > topic name when using a `MessageId` in `seek` or `acknowledge` on a
> > > multi-topics consumer. PIP-229 is more like a refactoring to allow the
> > > experienced developers access the fields of `MessageIdData` via a
> > > standard interface.
> > >
> > > [1] https://github.com/apache/pulsar/issues/18616#issuecomment-1328609346
> > > [2] https://lists.apache.org/thread/g8o0qtljllxnvck69dn36205xg5xr8cc
> > >
> > > Thanks,
> > > Yunze
> > >
> > >
> > > On Fri, Dec 16, 2022 at 8:50 PM 丛搏  wrote:
> > > >
> > > > Abstraction based on MessageIdData is a good solution. I don't have
> > > > any discussion context. Why don't we put the topic name in it?
> > > >
> > > > All I can think of is PulsarApiMessageId extend
> > > > TopicMessageId(PIP-224[1]) right?
> > > >
> > > > Thanks,
> > > > Bo
> > > > [1] https://github.com/apache/pulsar/issues/18616
> >

[VOTE] PIP-230: Throw exception when MessageIdImpl and BatchMessageIdImpl compare with each othe

2022-12-19 Thread
Hi, pulsar community:

I start a PIP about `Throw exception when MessageIdImpl and
BatchMessageIdImpl compare with each other`

discussion thread:
https://lists.apache.org/thread/oylkh76z1wv8h754c1tvtnspwjojqwmx

PIP: https://github.com/apache/pulsar/issues/18957
PR: https://github.com/apache/pulsar/pull/18981

Thanks,
Bo


Re: [DISCUSS] PIP-230: Modify MessageIdImpl and BatchMessageIdImpl compareTo(MessageId o) method

2022-12-19 Thread
Hi, Xiangying:

I have to change the PIP when `MessageIdImpl
compareTo(BatchMessageIdImpl)` or `BatchMessageIdImpl
compareTo(MessageIdImpl)` will throw an exception.

Thanks,
Bo

Xiangying Meng  于2022年12月19日周一 08:13写道:
>
> Hi Bo
> I am confused about this proposal.
> Why do there exist MessageID and BatchMessageID with the same ledgerID,
> entryID, and partitionIndex?
> And why it is right that the MessageID is always bigger than the
> BatchMessageID?
> I also leave some comments below on the issue. Could you help answer it?
>
> Thanks,
> Xiangying
>
> On Mon, Dec 19, 2022 at 2:54 AM Yubiao Feng
>  wrote:
>
> > Hi Bo
> >
> > > Disadvantages of the current compareTo() implementation: Many users may
> > use this method incorrectly, resulting in seek, ack, or resetCursor use
> > incorrect MessageId, which has the risk of losing messages, because they
> > will use a larger messageId to seek, ack etc.
> >
> > I'm a little curious why the message will lose, could you explain it?
> >
> > Thanks
> > Yubiao
> >
> > On Fri, Dec 16, 2022 at 5:31 PM 丛搏  wrote:
> >
> > > Hi, pulsar community:
> > >
> > > I start a PIP about modifying MessageIdImpl and BatchMessageIdImpl
> > > compareTo(MessageId o) method
> > >
> > > original discussion thread:
> > > https://lists.apache.org/thread/8n3oyk2hdsskkotnj4lnlvfnndctpqbg
> > >
> > > PIP: https://github.com/apache/pulsar/issues/18957
> > >
> > > Thanks,
> > > Bo
> > >
> >


Re: [DISCUSS] PIP-230: Modify MessageIdImpl and BatchMessageIdImpl compareTo(MessageId o) method

2022-12-19 Thread
Hi Yubiao:
> I'm a little curious why the message will lose, could you explain it?

for example, user have two messageId
batchMessageId = (ledgerId = 1, entryId = 1, batchIndex = 0, batchSize = 20)
messageId = (ledgerId = 1, entryId = 1, batchSize = 20)

if the user wants to ack the smallest messageId, they use the
compareTo with messageId and batchMessageId,  they always will use the
messageId, then the batchIndex 1-29 will lose.

I have to change the PIP when `MessageIdImpl
compareTo(BatchMessageIdImpl)` or `BatchMessageIdImpl
compareTo(MessageIdImpl)` will throw an exception. so in this way,
this will not still exist this situation.

Thanks,
Bo

Yubiao Feng  于2022年12月19日周一 02:54写道:
>
> Hi Bo
>
> > Disadvantages of the current compareTo() implementation: Many users may
> use this method incorrectly, resulting in seek, ack, or resetCursor use
> incorrect MessageId, which has the risk of losing messages, because they
> will use a larger messageId to seek, ack etc.
>
> I'm a little curious why the message will lose, could you explain it?
>
> Thanks
> Yubiao
>
> On Fri, Dec 16, 2022 at 5:31 PM 丛搏  wrote:
>
> > Hi, pulsar community:
> >
> > I start a PIP about modifying MessageIdImpl and BatchMessageIdImpl
> > compareTo(MessageId o) method
> >
> > original discussion thread:
> > https://lists.apache.org/thread/8n3oyk2hdsskkotnj4lnlvfnndctpqbg
> >
> > PIP: https://github.com/apache/pulsar/issues/18957
> >
> > Thanks,
> > Bo
> >


Re: [VOTE] Pulsar Release 2.11.0 Candidate-3

2022-12-18 Thread
+1 (non-binding)

system: mac os 12.6, Apple M1
maven: 3.8.5
java: OpenJDK 17.0.3

- Checked the signature
- Start standalone with zookeeper stream storage
- Publish and consume messages
- Verified Function and State Function
- Verified Cassandra connector
- Build from the source package

Thanks,
Bo

guo jiwei  于2022年12月17日周六 11:39写道:
>
> This is the third release candidate for Apache Pulsar, version 2.11.0.
>
> This release contains 1605 commits by 62 contributors.
> https://github.com/apache/pulsar/compare/v2.10.2...v2.11.0-candidate-3
>
> CI for this release candidate
> https://github.com/Technoboy-/pulsar/pull/21
>
> *** Please download, test and vote on this release. This vote will stay open
> for at least 72 hours ***
>
> Note that we are voting upon the source (tag), binaries are provided for
> convenience.
>
> Source and binary files:
> https://dist.apache.org/repos/dist/dev/pulsar/pulsar-2.11.0-candidate-3
>
> SHA-512 checksums:
>
> 096014c7a1bb7975c9eb7b7796d225e69cac066ca2e228019a7fffbde08a7a48d377932e5d7a2dca6adf51428e2aa52b1e47035624a554b9e30da024e7933256
>
>  ./apache-pulsar-2.11.0-bin.tar.gz
>
> b8e9bb39f6687190c05b6186bdc80cb0499b6600e8e9a29b942c91e650a64614d5205bf527fa47c1005aa57da96d0a9c2506b8ff2f51332fa8e8beef7bdaf7b3
>
>  ./apache-pulsar-2.11.0-src.tar.gz
>
> Maven staging repo:
> https://repository.apache.org/content/repositories/orgapachepulsar-1199/
>
> The tag to be voted upon:
> v2.11.0-candidate-3 (a1f88ff83f2eba822fdfa3b62b01ed75c6dcb9b3)
> https://github.com/apache/pulsar/releases/tag/v2.11.0-candidate-3
>
> Pulsar's KEYS file containing PGP keys we use to sign the release:
> https://dist.apache.org/repos/dist/dev/pulsar/KEYS
>
> Docker images:
> https://hub.docker.com/layers/mattison/pulsar-all/2.11.0-rc3/images/sha256-039de7f6c124adf5ee94b05879b4f833a4b8c1d7395a53f95b3cd8171d9f316b
> https://hub.docker.com/layers/mattison/pulsar/2.11.0-rc3/images/sha256-f6f693227a467f7bd3f518224ec99b3c4f23f290fff07449ef18c18e2b0cee6d
>
> Please download the source package, and follow the
> release-candidate-validation doc to build
> and run the Pulsar standalone service.
> https://pulsar.apache.org/contribute/validate-release-candidate
>
> Since the metadata store is changed from ZK to RocksDB, the verification of
> the `stateful functions` needs to set the parameter "export
> PULSAR_STANDALONE_USE_ZOOKEEPER=1"
>
>
> Regards
> Jiwei Guo (Tboy)


Re: [DISCUSS] PIP-229: Add a common interface to get fields of MessageIdData

2022-12-17 Thread
< For a single-topic consumer, wrapping the topic name is
< redundant and might break the existing behavior. In this case, if
< `PulsarApiMessageId` extends `TopicMessageId`, the `getTopicName()`
< method should return null, which is not a good design [1][2].

For `TopicMessageIdImpl`, it is an original method. for
`PulsarApiMessageId` if extend `TopicMessageId` it is a new method for
any `MessageId` extend `PulsarApiMessageId`, why do we have to return
null? I think it just reduces the transmission of useless fields at
the network layer and not added to MessageIdData. LedgerId and EntryId
are in PulsarApiMessageId, why shouldn't `topicName` be added in?

Thanks,
Bo

Yunze Xu  于2022年12月18日周日 14:23写道:
>
> Hi Bo,
>
> Because the topic name is not a part of MessageIdData. It's only used
> to find the correct internal consumer of a multi-topics consumer.
>
> > All I can think of is PulsarApiMessageId extend TopicMessageId(PIP-224[1]) 
> > right?
>
> No. The `TopicMessageId` could only be used in a multi-topics
> consumer. For a single-topic consumer, wrapping the topic name is
> redundant and might break the existing behavior. In this case, if
> `PulsarApiMessageId` extends `TopicMessageId`, the `getTopicName()`
> method should return null, which is not a good design [1][2].
>
> After both PIP-224 and PIP-229 are approved, the `TopicMessageIdImpl`
> will implement both `PulsarApiMessageId` and `TopicMessageId`
> interfaces. Other `MessageId` implementations only need to implement
> `PulsarApiMessageId`.
>
> BTW, PIP-224 mainly solves two problems:
> 1. When a multi-topics consumer acknowledges a `MessageId` that is not
> a `TopicMessageId`, a `PulsarClientException.NotAllowedException` will
> be thrown in synchronous methods. The asynchronous methods should not
> throw an exception.
> 2. For a multi-topics consumer, support seeking with a `TopicMessageId`.
>
> PIP-224 is designed for application users to specify an associated
> topic name when using a `MessageId` in `seek` or `acknowledge` on a
> multi-topics consumer. PIP-229 is more like a refactoring to allow the
> experienced developers access the fields of `MessageIdData` via a
> standard interface.
>
> [1] https://github.com/apache/pulsar/issues/18616#issuecomment-1328609346
> [2] https://lists.apache.org/thread/g8o0qtljllxnvck69dn36205xg5xr8cc
>
> Thanks,
> Yunze
>
>
> On Fri, Dec 16, 2022 at 8:50 PM 丛搏  wrote:
> >
> > Abstraction based on MessageIdData is a good solution. I don't have
> > any discussion context. Why don't we put the topic name in it?
> >
> > All I can think of is PulsarApiMessageId extend
> > TopicMessageId(PIP-224[1]) right?
> >
> > Thanks,
> > Bo
> > [1] https://github.com/apache/pulsar/issues/18616
> >
> > Yunze Xu  于2022年12月16日周五 15:59写道:
> > >
> > > Hi all,
> > >
> > > I've opened a PIP to discuss: 
> > > https://github.com/apache/pulsar/issues/18950
> > >
> > > Currently the `MessageId` interface is not friendly to developers of
> > > Pulsar core and ecosystems. There is no abstraction of the
> > > `MessageIdData` defined in `PulsarApi.proto`.
> > >
> > > This proposal aims at solving this problem and allows more loose type
> > > assertions when using `seek` and `acknowledge`.
> > >
> > > You can also see the demo for reference:
> > > https://github.com/BewareMyPower/pulsar/pull/11
> > >
> > > (Sorry I forgot to add the [DISCUSS] prefix again in the previous
> > > email, let's continue the discussion here)
> > >
> > > Thanks,
> > > Yunze


Re: [DISCUSS] Proposal required for Admin API/CLI and metrics changes

2022-12-17 Thread
> Is it time to require a proposal for Admin API/CLI and metrics changes?

yes, need a proposal for Admin API/CLI and metrics changes. but It is
difficult for us to judge whether a PR is a bug fix or needs PIP or
doc/required through automatic detection. We can standardize the
committer review process and increase the description of the review
document, which may achieve better results.

Thanks,
Bo

PengHui Li  于2022年12月17日周六 09:00写道:
>
> I have pushed out a PR to update the PR template and PIP for metrics changes
>
> https://github.com/apache/pulsar/pull/18961
>
> PTAL.
>
> Thanks,
> Penghui
>
> On Wed, Dec 7, 2022 at 4:17 PM Haiting Jiang  wrote:
>
> > +1 for enforcing the PIP procedures.
> >
> > > And the CI can try to add labels `doc required` or `wants/proposal`
> > > according to the list selections.
> >
> > Is it possible that the CI can check if there is a "voted" PIP linking
> > to this PR.
> > And the label can be manually added by committers if the PR author
> > missed checking the boxes.
> >
> > Thanks,
> > Haiting
> >
> > On Wed, Dec 7, 2022 at 4:07 PM PengHui Li  wrote:
> > >
> > > > I agree a proposal would be better before adding a PR. But the
> > > document part must be a part of such a proposal.
> > >
> > > Make sense. It looks like we should have a checklist for the proposal.
> > > The documentation changes should be listed in the proposal.
> > >
> > > > Can the PR template/GitHub process check that if either the api changes
> > > and doc-required are checked both are checked with textual information
> > > provided?
> > >
> > > It's a good idea.
> > > I haven't tried, but it looks like it's possible.
> > > We have this list:
> > >
> > > ```
> > > ### Does this pull request potentially affect one of the following parts:
> > >
> > > *If the box was checked, please highlight the changes*
> > >
> > > - [ ] Dependencies (add or upgrade a dependency)
> > > - [ ] The public API
> > > - [ ] The schema
> > > - [ ] The default values of configurations
> > > - [ ] The binary protocol
> > > - [ ] The REST endpoints
> > > - [ ] The admin CLI options
> > > - [ ] Anything that affects deployment
> > > ```
> > >
> > > And the CI can try to add labels `doc required` or `wants/proposal`
> > > according to the
> > > list selections.
> > >
> > > And we can add `The metrics` item to the list.
> > >
> > > Thanks,
> > > Penghui
> > >
> > > On Wed, Dec 7, 2022 at 1:52 PM Dave Fisher 
> > wrote:
> > >
> > > >
> > > >
> > > > Sent from my iPhone
> > > >
> > > > > On Dec 6, 2022, at 9:45 PM, Yunze Xu 
> > > > wrote:
> > > > >
> > > > > Hi Penghui,
> > > > >
> > > > >> But maybe some are missed.
> > > > >
> > > > > That's the point. Each PR that adds or modifies a metric item must be
> > > > > labeled with "doc-required" and the related documents should be
> > added.
> > > > > However, these PRs are nearly all labeled with "doc-not-needed".
> > > > >
> > > > > I agree a proposal would be better before adding a PR. But the
> > > > > document part must be a part of such a proposal.
> > > >
> > > > Can the PR template/GitHub process check that if either the api changes
> > > > and doc-required are checked both are checked with textual information
> > > > provided?
> > > >
> > > > Best,
> > > > Dave
> > > > >
> > > > > Thanks,
> > > > > Yunze
> > > > >
> > > > >> On Wed, Dec 7, 2022 at 11:48 AM PengHui Li 
> > wrote:
> > > > >>
> > > > >> Hi Yunze,
> > > > >>
> > > > >> All the metrics are listed here
> > > > >> https://pulsar.apache.org/docs/2.10.x/reference-metrics/
> > > > >>
> > > > >> But maybe some are missed.
> > > > >>
> > > > >> Thanks,
> > > > >> Penghui
> > > > >>
> > > > >> On Wed, Dec 7, 2022 at 11:46 AM Yunze Xu
> > 
> > > > >> wrote:
> > > > >>
> > > > >>> I agree. It should have required the PIP.
> > > > >>>
> > > > >>> I have another question. Is there any document to describe these
> > > > >>> metrics? I think the metrics body should be documented well to
> > avoid
> > > > >>> breaking changes. Some external applications might parse the
> > metrics
> > > > >>> according to a specific structure.
> > > > >>>
> > > > >>> Thanks,
> > > > >>> Yunze
> > > > >>>
> > > > >>> On Wed, Dec 7, 2022 at 11:38 AM PengHui Li 
> > wrote:
> > > > 
> > > >  Hi all,
> > > > 
> > > >  I would like to start a discussion about requiring a proposal for
> > > > Admin
> > > >  API/CLI
> > > >  and metrics changes.
> > > > 
> > > >  Here are some recent examples that changed the Admin API but
> > without
> > > >  proposals.
> > > >  I just checked the commit logs. Maybe some have a proposal. Just
> > > > forgot
> > > > >>> to
> > > >  add
> > > >  the proposal link to the PR.
> > > > 
> > > >  https://github.com/apache/pulsar/pull/18218
> > > >  https://github.com/apache/pulsar/pull/17153
> > > >  https://github.com/apache/pulsar/pull/16167
> > > >  https://github.com/apache/pulsar/pull/14930
> > > >  https://github.com/apache/pulsar/pull/17337
> > > > 

Re: [DISCUSS] PIP-229: Add a common interface to get fields of MessageIdData

2022-12-16 Thread
Abstraction based on MessageIdData is a good solution. I don't have
any discussion context. Why don't we put the topic name in it?

All I can think of is PulsarApiMessageId extend
TopicMessageId(PIP-224[1]) right?

Thanks,
Bo
[1] https://github.com/apache/pulsar/issues/18616

Yunze Xu  于2022年12月16日周五 15:59写道:
>
> Hi all,
>
> I've opened a PIP to discuss: https://github.com/apache/pulsar/issues/18950
>
> Currently the `MessageId` interface is not friendly to developers of
> Pulsar core and ecosystems. There is no abstraction of the
> `MessageIdData` defined in `PulsarApi.proto`.
>
> This proposal aims at solving this problem and allows more loose type
> assertions when using `seek` and `acknowledge`.
>
> You can also see the demo for reference:
> https://github.com/BewareMyPower/pulsar/pull/11
>
> (Sorry I forgot to add the [DISCUSS] prefix again in the previous
> email, let's continue the discussion here)
>
> Thanks,
> Yunze


Re: [DISCUSS] PIP-230: Modify MessageIdImpl and BatchMessageIdImpl compareTo(MessageId o) method

2022-12-16 Thread
> +1, LGTM but it should only be applied in 2.12.0 or later.

Yes, I agree with you.

Thanks,
Bo

Yunze Xu  于2022年12月16日周五 19:02写道:
>
> +1, LGTM but it should only be applied in 2.12.0 or later.
>
> Thanks,
> Yunze
>
> On Fri, Dec 16, 2022 at 6:26 PM Enrico Olivelli  wrote:
> >
> > I support this solution.
> >
> > Unfortunately it is better to not port this to the previous versions
> > because it is a behaviour change.
> >
> > Thanks for driving this effort
> >
> > Enrico
> >
> > Il giorno ven 16 dic 2022 alle ore 10:31 丛搏  ha scritto:
> > >
> > > Hi, pulsar community:
> > >
> > > I start a PIP about modifying MessageIdImpl and BatchMessageIdImpl
> > > compareTo(MessageId o) method
> > >
> > > original discussion thread:
> > > https://lists.apache.org/thread/8n3oyk2hdsskkotnj4lnlvfnndctpqbg
> > >
> > > PIP: https://github.com/apache/pulsar/issues/18957
> > >
> > > Thanks,
> > > Bo


Re: [DISCUSS] PIP-230: Modify MessageIdImpl and BatchMessageIdImpl compareTo(MessageId o) method

2022-12-16 Thread
> Unfortunately it is better to not port this to the previous versions
> because it is a behaviour change.

yes, It only is included in the new release. and add this change to
the new release blog.

Thanks,
Bo

Enrico Olivelli  于2022年12月16日周五 18:26写道:
>
> I support this solution.
>
> Unfortunately it is better to not port this to the previous versions
> because it is a behaviour change.
>
> Thanks for driving this effort
>
> Enrico
>
> Il giorno ven 16 dic 2022 alle ore 10:31 丛搏  ha scritto:
> >
> > Hi, pulsar community:
> >
> > I start a PIP about modifying MessageIdImpl and BatchMessageIdImpl
> > compareTo(MessageId o) method
> >
> > original discussion thread:
> > https://lists.apache.org/thread/8n3oyk2hdsskkotnj4lnlvfnndctpqbg
> >
> > PIP: https://github.com/apache/pulsar/issues/18957
> >
> > Thanks,
> > Bo


[DISCUSS] PIP-230: Modify MessageIdImpl and BatchMessageIdImpl compareTo(MessageId o) method

2022-12-16 Thread
Hi, pulsar community:

I start a PIP about modifying MessageIdImpl and BatchMessageIdImpl
compareTo(MessageId o) method

original discussion thread:
https://lists.apache.org/thread/8n3oyk2hdsskkotnj4lnlvfnndctpqbg

PIP: https://github.com/apache/pulsar/issues/18957

Thanks,
Bo


Re: [VOTE] PIP-224: Introduce TopicMessageId for consumer's MessageId related APIs

2022-12-15 Thread
if you don't change this in PIP-229 or PIP-224, I will create a new
PIP to handle the `BatchMessageIdImpl` and `MessageIdImpl`
`compareTo()` method, now I have no problem with this PIP
+1 (non-binding)
Sorry to bother this PIP vote.

Thanks,
Bo

Yunze Xu  于2022年12月16日周五 11:58写道:
>
> If this breaking change can pass the PMC votes, I will keep the new
> semantics in PIP-229. Otherwise, it would not make sense to adopt the
> new semantics in PIP-229.
>
> Thanks,
> Yunze
>
> On Fri, Dec 16, 2022 at 11:46 AM Yunze Xu  wrote:
> >
> > I cannot find any confusing code from the proposal itself. Could you
> > point it out? If you are mentioning the `legacyCompare` and `compare`
> > methods in #18890 [1], it's not related to this proposal. And I have
> > opened PIP-229 [2] for discussion.
> >
> > BTW, the PIP-229 itself doesn't mention the compare logic. But I'm not
> > going to adopt the new semantics because it's actually a breaking
> > change, just as I've replied. You might think it's a bug, but it's a
> > public API. Any change of the semantics in the public API is a
> > breaking change.
> >
> > [1] https://github.com/apache/pulsar/pull/18890/files
> > [2] https://lists.apache.org/thread/x52zpwlo8pxzp81oxllh5vw82kyrzgpk
> >
> > On Fri, Dec 16, 2022 at 11:34 AM 丛搏  wrote:
> > >
> > > Although unrelated, it adds a lot of confusing code.
> > >
> > > Thanks,
> > > Bo
> > >
> > > Yunze Xu  于2022年12月16日周五 08:05写道:
> > >
> > > >
> > > > This proposal is not related to the comparison logic between
> > > > BatchMessageIdImpl and MessageIdImpl.
> > > >
> > > > Thanks,
> > > > Yunze
> > > >
> > > > On Thu, Dec 15, 2022 at 12:58 PM 丛搏  wrote:
> > > > >
> > > > > -1 (non-binding)
> > > > > sorry, I have one question about the BatchMessageId compareTo()
> > > > > method. the discussion mail :
> > > > > https://lists.apache.org/thread/8n3oyk2hdsskkotnj4lnlvfnndctpqbg.
> > > > > I hope it can be this issue can be discussed clearly.
> > > > >
> > > > > I hope it can be this issue can be discussed clearly. I will retry to
> > > > > vote until this issue clearly :
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Bo
> > > > >
> > > > > 丛搏  于2022年12月14日周三 22:56写道:
> > > > > >
> > > > > > +1 (non-binding)
> > > > > >
> > > > > > Thanks,
> > > > > > Bo
> > > > > >
> > > > > > PengHui Li  于2022年12月14日周三 19:12写道:
> > > > > > >
> > > > > > > +1 (binding)
> > > > > > >
> > > > > > > - Penghui
> > > > > > >
> > > > > > > On Sun, Dec 11, 2022 at 6:36 AM Enrico Olivelli 
> > > > > > >  wrote:
> > > > > > >
> > > > > > > > +1 (binding)
> > > > > > > >
> > > > > > > > Enrico
> > > > > > > >
> > > > > > > > Il Ven 9 Dic 2022, 10:41 Jiaqi Shen  ha 
> > > > > > > > scritto:
> > > > > > > >
> > > > > > > > > +1(non-binding)
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Jiaqi Shen
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >  于2022年12月5日周一 15:23写道:
> > > > > > > > >
> > > > > > > > > > +1(non-binding)
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Mattison
> > > > > > > > > > On Dec 5, 2022, 15:09 +0800, Zike Yang , 
> > > > > > > > > > wrote:
> > > > > > > > > > > +1(non-binding)
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Zike Yang
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Dec 5, 2022 at 2:41 PM Baodi Shi
> > > > > > > >  > > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > +1(non-binding)
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Baodi Shi
> > > > > > > > > > > >
> > > > > > > > > > > > > > 2022年12月5日 12:51,Yunze Xu 
> > > > > > > > > > > > > >  写道:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I'm starting the vote for PIP-224: Introduce 
> > > > > > > > > > > > > > TopicMessageId for
> > > > > > > > > > > > > > consumer's MessageId related APIs:
> > > > > > > > > > > > > > https://github.com/apache/pulsar/issues/18616
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Here is the discussion thread:
> > > > > > > > > > > > > >
> > > > > > > > https://lists.apache.org/thread/jhqy65cdyxzmmxnfsjm8rv9pbk76noxy
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The vote will be open for at least 3 days.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > Yunze
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >


Re: [DISCUSSIONS] Should we use AUTO_PRODUCE schema?

2022-12-15 Thread
> When the input messages are raw bytes, we cannot guarantee the
> validation always succeeds because the schema might change. The
> exception is actually thrown in `TypedMessageBuilder#value`.
>
> But since these APIs are stable, we could only fix it by adding the
> documents to describe in which cases could `TypedMessageBuilder#value`
> and `Message#getValue` throw exceptions.

yes, this is a better way to handle this. I agree to optimize the
AUTO_PRODUCE schema in this way. It makes a lot of sense to use the
API correctly.

Thanks,
Bo

Yunze Xu  于2022年12月16日周五 12:16写道:
>
> > It is irresponsible behavior of the producer to leave everything to the 
> > consumer.
>
> I agreed now.
>
> > I think what we need to do is describe the document clearly
>
> IMO, it's a code problem because there is no exception signature for
> `TypedMessageBuilder#value` and `Message#getValue`. The application
> users should catch the exception. It could be better if the exception
> is thrown during `send` or `receive` and wrapped into
> `PulsarClientException`.
>
> When the input messages are raw bytes, we cannot guarantee the
> validation always succeeds because the schema might change. The
> exception is actually thrown in `TypedMessageBuilder#value`.
>
> But since these APIs are stable, we could only fix it by adding the
> documents to describe in which cases could `TypedMessageBuilder#value`
> and `Message#getValue` throw exceptions.
>
> Thanks,
> Yunze
>
> On Thu, Dec 15, 2022 at 12:48 PM 丛搏  wrote:
> >
> > We also can use BYTES producer, but in BYTES schema, do not use
> > .newMessage(schema0), the message will not carry the schema version.
> > the consumer will not decode correctly.
> >
> > and BYTES schema can't validate the data schema. if the data is empty
> > bytes array, It does not make sense to send it to the broker.
> >
> > It is irresponsible behavior of the producer to leave everything to
> > the consumer. I think AUTO_PRODUCER simplifies the data validation
> > process for users.
> >
> > I think what we need to do is describe the document clearly and
> > distinguish it from BYTES rather than delete or deprecate it.
> >
> > Thanks,
> > Bo
> >
> >
> > Yunze Xu  于2022年12月14日周三 23:36写道:
> >
> > >
> > > Why not use the following code with a BYTES producer in your case?
> > >
> > > ```java
> > > var schema0 = Schema.AVRO(SchemaDefinition.builder()
> > > .withJsonDef("student with version0 json def").build();
> > > p.newMessage(schema0).value(schema0.decode(student1)).send();
> > > ...
> > > ```
> > >
> > > Thanks,
> > > Yunze
> > >
> > > On Wed, Dec 14, 2022 at 10:37 PM 丛搏  wrote:
> > > >
> > > > Yunze Xu  于2022年12月14日周三 20:37写道:
> > > > >
> > > > > > how do you can create two Student.class in one java process? and use
> > > > > the same namespace?
> > > > >
> > > > > Could you give an example to show how `AUTO_PRODUCE` schema makes a 
> > > > > difference?
> > > >
> > > > // this is Student use version0, may be data from kafka
> > > > byte[] student1 = autoConsumer.receive().getData();
> > > > // this is Student use version1, may be data from kafka
> > > > byte[] student2 = autoConsumer.receive().getData();
> > > > // send student with version0 schema date
> > > > p.newMessage(Schema.AUTO_PRODUCE_BYTES(Schema.AVRO(SchemaDefinition.builder()
> > > > .withJsonDef("student with version0 json def").build(
> > > > .value(student1).send();
> > > >
> > > > // send student with version1 schema date
> > > > p.newMessage(Schema.AUTO_PRODUCE_BYTES(Schema.AVRO(SchemaDefinition.builder()
> > > > .withJsonDef("student with version1 json def").build(
> > > > .value(student1).send();
> > > >
> > > > >
> > > > > But with AUTO_PRODUCE schema, the precondition is that we have a topic
> > > > > that has messages of these two schemas.
> > > > >
> > > > > For example, there is a `bytes-topic` without schema that has two 
> > > > > messages:
> > > > > - msg0: Serialized from `new Student("abc")` (schema v0)
> > > > > - msg1: Serialized from `new Student("abc", 1)` (schema v1)
> > > > >
> > > > > Then you can consume these bytes, and send the messages to

Re: [VOTE] PIP-224: Introduce TopicMessageId for consumer's MessageId related APIs

2022-12-15 Thread
Although unrelated, it adds a lot of confusing code.

Thanks,
Bo

Yunze Xu  于2022年12月16日周五 08:05写道:

>
> This proposal is not related to the comparison logic between
> BatchMessageIdImpl and MessageIdImpl.
>
> Thanks,
> Yunze
>
> On Thu, Dec 15, 2022 at 12:58 PM 丛搏  wrote:
> >
> > -1 (non-binding)
> > sorry, I have one question about the BatchMessageId compareTo()
> > method. the discussion mail :
> > https://lists.apache.org/thread/8n3oyk2hdsskkotnj4lnlvfnndctpqbg.
> > I hope it can be this issue can be discussed clearly.
> >
> > I hope it can be this issue can be discussed clearly. I will retry to
> > vote until this issue clearly :
> >
> >
> > Thanks,
> > Bo
> >
> > 丛搏  于2022年12月14日周三 22:56写道:
> > >
> > > +1 (non-binding)
> > >
> > > Thanks,
> > > Bo
> > >
> > > PengHui Li  于2022年12月14日周三 19:12写道:
> > > >
> > > > +1 (binding)
> > > >
> > > > - Penghui
> > > >
> > > > On Sun, Dec 11, 2022 at 6:36 AM Enrico Olivelli  
> > > > wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > Enrico
> > > > >
> > > > > Il Ven 9 Dic 2022, 10:41 Jiaqi Shen  ha 
> > > > > scritto:
> > > > >
> > > > > > +1(non-binding)
> > > > > >
> > > > > > Thanks,
> > > > > > Jiaqi Shen
> > > > > >
> > > > > >
> > > > > >  于2022年12月5日周一 15:23写道:
> > > > > >
> > > > > > > +1(non-binding)
> > > > > > >
> > > > > > > Best,
> > > > > > > Mattison
> > > > > > > On Dec 5, 2022, 15:09 +0800, Zike Yang , wrote:
> > > > > > > > +1(non-binding)
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Zike Yang
> > > > > > > >
> > > > > > > > On Mon, Dec 5, 2022 at 2:41 PM Baodi Shi
> > > > >  > > > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > +1(non-binding)
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Baodi Shi
> > > > > > > > >
> > > > > > > > > > > 2022年12月5日 12:51,Yunze Xu  
> > > > > > > > > > > 写道:
> > > > > > > > > > >
> > > > > > > > > > > Hi all,
> > > > > > > > > > >
> > > > > > > > > > > I'm starting the vote for PIP-224: Introduce 
> > > > > > > > > > > TopicMessageId for
> > > > > > > > > > > consumer's MessageId related APIs:
> > > > > > > > > > > https://github.com/apache/pulsar/issues/18616
> > > > > > > > > > >
> > > > > > > > > > > Here is the discussion thread:
> > > > > > > > > > >
> > > > > https://lists.apache.org/thread/jhqy65cdyxzmmxnfsjm8rv9pbk76noxy
> > > > > > > > > > >
> > > > > > > > > > > The vote will be open for at least 3 days.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Yunze
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >


Re: [DISCUSS] Modify MessageIdImpl and BatchMessageIdImpl compareTo(MessageId o) method

2022-12-15 Thread
It doesn’t matter if PIP is open. can directly describe it clearly in
https://lists.apache.org/thread/mbrpjsgrgwrlkdpvkk738jxnlk7rf4qk and
modify it. But either newly opened PIP or in
https://lists.apache.org/thread/mbrpjsgrgwrlkdpvkk738jxnlk7rf4q, my
point is that we need to fix it, not compatible with it.

Thanks,
Bo

Yunze Xu  于2022年12月16日周五 08:21写道:
>
> Let's see another example that is considered as a "bug", not a breaking 
> change.
>
> https://lists.apache.org/thread/88t1xxf68j092k09srdwyzj1tk4ml5n9
>
> > I think that this is fixing a bug, if the topic does not exist we should 
> > return "not found".
>
> The "bug" and "breaking change" are not in contrast. What a user
> thinks is a bug might be a breaking change. Would you like to open a
> PIP for that? From my perspective, both modifying or not is okay.
>
> Thanks,
> Yunze
>
> On Wed, Dec 14, 2022 at 11:27 PM 丛搏  wrote:
> >
> > I still feel better to change compareTo directly.
> >
> > 1. Although using PulsarApiMessageId.campare() can reduce the
> > probability of developers using errors, it cannot be completely
> > avoided.
> >
> > 2. While a direct change would change the default behavior, I consider
> > it a bug, not a breaking change. We can explain it in the new version
> > release blog. Maybe some users use it, but they didn’t find the
> > problem, and we changed it correctly . I don't think any user will be
> > able to use the current compareTo() correctly. Because the current
> > implementation is unexpected. When the user finds out that this
> > problem exists, he will not use this method.
> >
> > Thanks,
> > Bo
> >
> > Yunze Xu  于2022年12月8日周四 20:43写道:
> > >
> > > Actually I'm refactoring the MessageId related code [1], whose current
> > > implementations are very messy from my perspective. My solution to
> > > this issue is adding two compare methods, one of them is the "wrong"
> > > implementation and used in `MessageId#compareTo` to avoid the breaking
> > > change. See the `legacyCompare` and `compare` methods.
> > >
> > > ```java
> > > // The legacy compare method, which treats the non-batched message id
> > > as preceding the batched message id.
> > > // However, this behavior is wrong because a non-batched message id
> > > represents an entry, while a batched message
> > > // represents a single message in the entry, which should precedes the
> > > message id.
> > > // Keep this implementation just for backward compatibility when users
> > > compare two message ids.
> > > static int legacyCompare(MessageIdDataInterface lhs,
> > > MessageIdDataInterface rhs) { /* ... */ }
> > >
> > > static int compare(MessageIdDataInterface lhs, MessageIdDataInterface
> > > rhs) { /* ... */ }
> > > ```
> > >
> > > [1] https://github.com/BewareMyPower/pulsar/pull/11/files
> > >
> > > Thanks,
> > > Yunze
> > >
> > > On Thu, Dec 8, 2022 at 7:22 PM 丛搏  wrote:
> > > >
> > > > Hi, Yunze:
> > > > If we don't change this behavior, we should pay special attention when
> > > > coding `pulsar-client`, because it is a point that is easy to
> > > > overlook. its impact may be more serious than "wrong " behavior
> > > > produced by the user using the current compareTo() method manually. I
> > > > don’t think this is a breaking change. On the contrary, it is a bug
> > > > that needs to be fixed. Because we cannot guarantee that everyone can
> > > > find the problem of compareTo() in time when writing code or reviewing
> > > > pr. The current implementation is Very anti-human.
> > > >
> > > > Thanks,
> > > > bo
> > > >
> > > > Yunze Xu  于2022年12月8日周四 18:02写道:
> > > > >
> > > > > Actually, from the user side, this comparison would never happen.
> > > > > Users could never receive two MessageId objects with the same ledger
> > > > > id, entry id while the batch index fields are different. This
> > > > > comparison could only exist in the `pulsar-client` implementation.
> > > > >
> > > > > If users touch the case, the MessageId object must be created
> > > > > manually, which is a hack. The "wrong" behavior might be used. So my
> > > > > perspective is that we should not change this behavior.
> > > > >
> > > > > Thanks,
> > 

Re: [VOTE] PIP-224: Introduce TopicMessageId for consumer's MessageId related APIs

2022-12-14 Thread
-1 (non-binding)
sorry, I have one question about the BatchMessageId compareTo()
method. the discussion mail :
https://lists.apache.org/thread/8n3oyk2hdsskkotnj4lnlvfnndctpqbg.
I hope it can be this issue can be discussed clearly.

I hope it can be this issue can be discussed clearly. I will retry to
vote until this issue clearly :


Thanks,
Bo

丛搏  于2022年12月14日周三 22:56写道:
>
> +1 (non-binding)
>
> Thanks,
> Bo
>
> PengHui Li  于2022年12月14日周三 19:12写道:
> >
> > +1 (binding)
> >
> > - Penghui
> >
> > On Sun, Dec 11, 2022 at 6:36 AM Enrico Olivelli  wrote:
> >
> > > +1 (binding)
> > >
> > > Enrico
> > >
> > > Il Ven 9 Dic 2022, 10:41 Jiaqi Shen  ha scritto:
> > >
> > > > +1(non-binding)
> > > >
> > > > Thanks,
> > > > Jiaqi Shen
> > > >
> > > >
> > > >  于2022年12月5日周一 15:23写道:
> > > >
> > > > > +1(non-binding)
> > > > >
> > > > > Best,
> > > > > Mattison
> > > > > On Dec 5, 2022, 15:09 +0800, Zike Yang , wrote:
> > > > > > +1(non-binding)
> > > > > >
> > > > > > Best,
> > > > > > Zike Yang
> > > > > >
> > > > > > On Mon, Dec 5, 2022 at 2:41 PM Baodi Shi
> > >  > > > >
> > > > > wrote:
> > > > > > >
> > > > > > > +1(non-binding)
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Baodi Shi
> > > > > > >
> > > > > > > > > 2022年12月5日 12:51,Yunze Xu  写道:
> > > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > I'm starting the vote for PIP-224: Introduce TopicMessageId 
> > > > > > > > > for
> > > > > > > > > consumer's MessageId related APIs:
> > > > > > > > > https://github.com/apache/pulsar/issues/18616
> > > > > > > > >
> > > > > > > > > Here is the discussion thread:
> > > > > > > > >
> > > https://lists.apache.org/thread/jhqy65cdyxzmmxnfsjm8rv9pbk76noxy
> > > > > > > > >
> > > > > > > > > The vote will be open for at least 3 days.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Yunze
> > > > > > >
> > > > >
> > > >
> > >


Re: [DISCUSSIONS] Should we use AUTO_PRODUCE schema?

2022-12-14 Thread
We also can use BYTES producer, but in BYTES schema, do not use
.newMessage(schema0), the message will not carry the schema version.
the consumer will not decode correctly.

and BYTES schema can't validate the data schema. if the data is empty
bytes array, It does not make sense to send it to the broker.

It is irresponsible behavior of the producer to leave everything to
the consumer. I think AUTO_PRODUCER simplifies the data validation
process for users.

I think what we need to do is describe the document clearly and
distinguish it from BYTES rather than delete or deprecate it.

Thanks,
Bo


Yunze Xu  于2022年12月14日周三 23:36写道:

>
> Why not use the following code with a BYTES producer in your case?
>
> ```java
> var schema0 = Schema.AVRO(SchemaDefinition.builder()
> .withJsonDef("student with version0 json def").build();
> p.newMessage(schema0).value(schema0.decode(student1)).send();
> ...
> ```
>
> Thanks,
> Yunze
>
> On Wed, Dec 14, 2022 at 10:37 PM 丛搏  wrote:
> >
> > Yunze Xu  于2022年12月14日周三 20:37写道:
> > >
> > > > how do you can create two Student.class in one java process? and use
> > > the same namespace?
> > >
> > > Could you give an example to show how `AUTO_PRODUCE` schema makes a 
> > > difference?
> >
> > // this is Student use version0, may be data from kafka
> > byte[] student1 = autoConsumer.receive().getData();
> > // this is Student use version1, may be data from kafka
> > byte[] student2 = autoConsumer.receive().getData();
> > // send student with version0 schema date
> > p.newMessage(Schema.AUTO_PRODUCE_BYTES(Schema.AVRO(SchemaDefinition.builder()
> > .withJsonDef("student with version0 json def").build(
> > .value(student1).send();
> >
> > // send student with version1 schema date
> > p.newMessage(Schema.AUTO_PRODUCE_BYTES(Schema.AVRO(SchemaDefinition.builder()
> > .withJsonDef("student with version1 json def").build(
> > .value(student1).send();
> >
> > >
> > > But with AUTO_PRODUCE schema, the precondition is that we have a topic
> > > that has messages of these two schemas.
> > >
> > > For example, there is a `bytes-topic` without schema that has two 
> > > messages:
> > > - msg0: Serialized from `new Student("abc")` (schema v0)
> > > - msg1: Serialized from `new Student("abc", 1)` (schema v1)
> > >
> > > Then you can consume these bytes, and send the messages to **a topic
> > > that has registered a schema**.
> > > - If the schema is v0, it's okay to send msg0 and msg1 to the topic.
> > > But the msg1 will lose some bytes because the schema v0 doesn't have
> > > the `age` field.
> > > - If the schema is v1, msg0 cannot be sent because msg0 doesn't have
> > > the `age` field.
> > >
> > > So which schema did you expect for this topic?
> > if you use AUTO_PRODUCE_BYTES, the message will have the correct schema 
> > version.
> > link code: 
> > https://github.com/apache/pulsar/blob/4129583c418dd68f8303dee601132e2910cdf8e6/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java#L718-L746
> >
> > the msg0 will be sent with schema v0
> > this msg1 will be sent with schema v1
> > >
> > > This example also shows AUTO_PRODUCE schema performs validation at
> > > producer side.
> > >
> > > However, if we just send msg0 and msg1 to a topic without schema. Then
> > > it will be consumer's responsibility to determine whether the received
> > > message is valid.
> > >
> > > ```java
> > > var bytes = consumer.receive(); // bytes
> > > var student = Schema.AVRO(Student.class).decode(bytes);
> > > ```
> > >
> > > - If the `Student` is v0, msg0 and msg1 can be decoded successfully.
> > > - If the `Student` is v1, decoding msg0 will throw an exception.
> > >
> > > Since all messages are stored in the topic, the downstream side
> > > (consumer) can catch the exception to discard the bytes without the
> > > expected schema.
> > >
> > > But if the validation fails at the producer side, there is a chance
> > > that msg0 is lost. In addition, let's see the producer and consumer
> > > code in this case.
> > >
> > > ```
> > > producer.send(msg0); // validation happens at the producer side
> > > ```
> > >
> > > ```
> > > var msg = consumer.receive();
> > > var student = msg.getValue(); // validation happens again, though 

Re: [DISCUSS] Modify MessageIdImpl and BatchMessageIdImpl compareTo(MessageId o) method

2022-12-14 Thread
I still feel better to change compareTo directly.

1. Although using PulsarApiMessageId.campare() can reduce the
probability of developers using errors, it cannot be completely
avoided.

2. While a direct change would change the default behavior, I consider
it a bug, not a breaking change. We can explain it in the new version
release blog. Maybe some users use it, but they didn’t find the
problem, and we changed it correctly . I don't think any user will be
able to use the current compareTo() correctly. Because the current
implementation is unexpected. When the user finds out that this
problem exists, he will not use this method.

Thanks,
Bo

Yunze Xu  于2022年12月8日周四 20:43写道:
>
> Actually I'm refactoring the MessageId related code [1], whose current
> implementations are very messy from my perspective. My solution to
> this issue is adding two compare methods, one of them is the "wrong"
> implementation and used in `MessageId#compareTo` to avoid the breaking
> change. See the `legacyCompare` and `compare` methods.
>
> ```java
> // The legacy compare method, which treats the non-batched message id
> as preceding the batched message id.
> // However, this behavior is wrong because a non-batched message id
> represents an entry, while a batched message
> // represents a single message in the entry, which should precedes the
> message id.
> // Keep this implementation just for backward compatibility when users
> compare two message ids.
> static int legacyCompare(MessageIdDataInterface lhs,
> MessageIdDataInterface rhs) { /* ... */ }
>
> static int compare(MessageIdDataInterface lhs, MessageIdDataInterface
> rhs) { /* ... */ }
> ```
>
> [1] https://github.com/BewareMyPower/pulsar/pull/11/files
>
> Thanks,
> Yunze
>
> On Thu, Dec 8, 2022 at 7:22 PM 丛搏  wrote:
> >
> > Hi, Yunze:
> > If we don't change this behavior, we should pay special attention when
> > coding `pulsar-client`, because it is a point that is easy to
> > overlook. its impact may be more serious than "wrong " behavior
> > produced by the user using the current compareTo() method manually. I
> > don’t think this is a breaking change. On the contrary, it is a bug
> > that needs to be fixed. Because we cannot guarantee that everyone can
> > find the problem of compareTo() in time when writing code or reviewing
> > pr. The current implementation is Very anti-human.
> >
> > Thanks,
> > bo
> >
> > Yunze Xu  于2022年12月8日周四 18:02写道:
> > >
> > > Actually, from the user side, this comparison would never happen.
> > > Users could never receive two MessageId objects with the same ledger
> > > id, entry id while the batch index fields are different. This
> > > comparison could only exist in the `pulsar-client` implementation.
> > >
> > > If users touch the case, the MessageId object must be created
> > > manually, which is a hack. The "wrong" behavior might be used. So my
> > > perspective is that we should not change this behavior.
> > >
> > > Thanks,
> > > Yunze
> > >
> > > On Thu, Dec 8, 2022 at 5:36 PM 丛搏  wrote:
> > > >
> > > > Hi, all:
> > > >
> > > > does anyone have any suggestions?
> > > >
> > > > Thanks,
> > > > bo
> > > >
> > > > 丛搏  于2022年11月21日周一 18:57写道:
> > > > >
> > > > > Hello, Pulsar community:
> > > > >
> > > > > now when `BatchMessageIdImpl` and `MessageIdImpl` with the same
> > > > > `ledgerId` and `EntryId`, one of it compare with the other, the
> > > > > `BatchMessageIdImpl` will always be greater than MessageIdImpl.
> > > > > see : 
> > > > > https://github.com/apache/pulsar/blob/master/pulsar-client/src/main/java/org/apache/pulsar/client/impl/BatchMessageIdImpl.java#L71-L74
> > > > >
> > > > > https://github.com/apache/pulsar/blob/master/pulsar-client/src/main/java/org/apache/pulsar/client/impl/MessageIdImpl.java#L219-L228
> > > > >
> > > > > but when we use it, we may think `MessageIdImpl` is bigger than
> > > > > `BatchMessageIdImpl` with the same `ledgerId` and `EntryId`. It causes
> > > > > a lot of bugs. I think we need to change this `compareTo()` method,
> > > > > although it is a public API, I think it is not a breaking change, it
> > > > > is a bug that needs to be fixed.
> > > > > eg. : https://github.com/apache/pulsar/pull/18486, need to add the
> > > > > separate logic for compareTo().
> > > > >
> > > > > Please leave your thoughts, thanks.
> > > > >
> > > > > Thanks,
> > > > > bo


Re: [VOTE] PIP-224: Introduce TopicMessageId for consumer's MessageId related APIs

2022-12-14 Thread
+1 (non-binding)

Thanks,
Bo

PengHui Li  于2022年12月14日周三 19:12写道:
>
> +1 (binding)
>
> - Penghui
>
> On Sun, Dec 11, 2022 at 6:36 AM Enrico Olivelli  wrote:
>
> > +1 (binding)
> >
> > Enrico
> >
> > Il Ven 9 Dic 2022, 10:41 Jiaqi Shen  ha scritto:
> >
> > > +1(non-binding)
> > >
> > > Thanks,
> > > Jiaqi Shen
> > >
> > >
> > >  于2022年12月5日周一 15:23写道:
> > >
> > > > +1(non-binding)
> > > >
> > > > Best,
> > > > Mattison
> > > > On Dec 5, 2022, 15:09 +0800, Zike Yang , wrote:
> > > > > +1(non-binding)
> > > > >
> > > > > Best,
> > > > > Zike Yang
> > > > >
> > > > > On Mon, Dec 5, 2022 at 2:41 PM Baodi Shi
> >  > > >
> > > > wrote:
> > > > > >
> > > > > > +1(non-binding)
> > > > > >
> > > > > > Thanks,
> > > > > > Baodi Shi
> > > > > >
> > > > > > > > 2022年12月5日 12:51,Yunze Xu  写道:
> > > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I'm starting the vote for PIP-224: Introduce TopicMessageId for
> > > > > > > > consumer's MessageId related APIs:
> > > > > > > > https://github.com/apache/pulsar/issues/18616
> > > > > > > >
> > > > > > > > Here is the discussion thread:
> > > > > > > >
> > https://lists.apache.org/thread/jhqy65cdyxzmmxnfsjm8rv9pbk76noxy
> > > > > > > >
> > > > > > > > The vote will be open for at least 3 days.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Yunze
> > > > > >
> > > >
> > >
> >


  1   2   >