Re: [DISCUSS] Pulsar Protocol For Client Timeouts and Creating Producers

2021-12-02 Thread Dave Fisher



> On Dec 1, 2021, at 10:21 AM, Michael Marshall  wrote:
> 
> Following up here, I am still in need of reviews for PR [0]. It
> introduces an important clarification to the Pulsar Protocol Spec.
> Please take a look, if you are able.
> 
> [0] - https://github.com/apache/pulsar/pull/12948

I added some suggested reviewers to the PR.

Regards,
Dave

> 
> Thanks!
> Michael
> 
> On Tue, Nov 23, 2021 at 1:10 PM Michael Marshall  
> wrote:
>> 
>> I created a PR to update the protocol's documentation:
>> https://github.com/apache/pulsar/pull/12948. Please take a look, if
>> you're able.
>> 
>> Once this PR is accepted/merged, I will follow up with an update to
>> the Java client.
>> 
>> - Michael
>> 
>> On Thu, Nov 18, 2021 at 1:29 PM Michael Marshall  
>> wrote:
>>> 
>>> I view this as an edge case of the Pulsar Protocol that requires
>>> clarification. Once we clarify the spec, we can update the clients to
>>> conform to the spec. I don't think we need to give end users control
>>> over this small part of the protocol.
>>> 
>>> After reviewing the design a bit more, I think we should update the
>>> Java client to send the `CloseProducer` command, as well as update the
>>> spec to recommend this implementation. While the `ServerCnx` class
>>> "works" both ways, the current Java client implementation leads to
>>> unnecessary calls, unnecessary errors, and a longer time to recovery.
>>> 
>>> Specifically, if the client fails to send a `CloseProducer` command,
>>> it ends up getting into a sequence of retries where each new
>>> `Producer` command receives an immediate `ErrorResponse` because the
>>> `ServerCnx` already has a pending producer. By sending a
>>> `CloseProducer` command, the client gives the broker permission to
>>> stop keeping track of the original create producer request. It also
>>> means that if the topic eventually loads, the broker will respond to
>>> the right request id with a `ProducerSuccessResponse` command.
>>> 
>>> I will follow up with an update to the client and the protocol spec,
>>> unless there are any objections.
>>> 
>>> Thanks,
>>> Michael
>>> 
>>> On Thu, Nov 18, 2021 at 12:25 PM Neng Lu  wrote:
 
 How about making the behavior when timeout configurable? And by default, 
 it will send the `CloseProducer` cmd?
 
 On 2021/11/17 05:52:21 Michael Marshall wrote:
> Hi All,
> 
> I noticed that the `ServerCnxTest#testCreateProducerTimeout` test
> indicates that a producer will send a `CloserProducer` command when
> producer creation times out for the client.
> 
> The Java client does not follow this protocol. When the producer
> creation times out, it just schedules another attempt to create the
> producer.
> 
> The C++ client does follow this protocol and is annotated with the
> following comment:
> 
>> // Creating the producer has timed out. We need to ensure the broker 
>> closes the producer
>> // in case it was indeed created, otherwise it might prevent new create 
>> producer operation,
>> // since we are not closing the connection
> 
> I don't see anything in our official protocol spec indicating the
> "right" behavior. Given the test coverage, it seems like the initial
> design was to expect a `CloseProducer` command. However, I also see that
> the broker's `ServerCnx` class will reply to a redundant `Producer`
> command with a `ProducerSuccess` command, as long as the producer
> is already created.
> 
> Should I submit a PR to update the Java client to send a
> `CloseProducer` command when a `Producer` command times out?
> 
> Thanks,
> Michael
> 



Re: [DISCUSS] Pulsar Protocol For Client Timeouts and Creating Producers

2021-12-01 Thread Michael Marshall
Following up here, I am still in need of reviews for PR [0]. It
introduces an important clarification to the Pulsar Protocol Spec.
Please take a look, if you are able.

[0] - https://github.com/apache/pulsar/pull/12948

Thanks!
Michael

On Tue, Nov 23, 2021 at 1:10 PM Michael Marshall  wrote:
>
> I created a PR to update the protocol's documentation:
> https://github.com/apache/pulsar/pull/12948. Please take a look, if
> you're able.
>
> Once this PR is accepted/merged, I will follow up with an update to
> the Java client.
>
> - Michael
>
> On Thu, Nov 18, 2021 at 1:29 PM Michael Marshall  
> wrote:
> >
> > I view this as an edge case of the Pulsar Protocol that requires
> > clarification. Once we clarify the spec, we can update the clients to
> > conform to the spec. I don't think we need to give end users control
> > over this small part of the protocol.
> >
> > After reviewing the design a bit more, I think we should update the
> > Java client to send the `CloseProducer` command, as well as update the
> > spec to recommend this implementation. While the `ServerCnx` class
> > "works" both ways, the current Java client implementation leads to
> > unnecessary calls, unnecessary errors, and a longer time to recovery.
> >
> > Specifically, if the client fails to send a `CloseProducer` command,
> > it ends up getting into a sequence of retries where each new
> > `Producer` command receives an immediate `ErrorResponse` because the
> > `ServerCnx` already has a pending producer. By sending a
> > `CloseProducer` command, the client gives the broker permission to
> > stop keeping track of the original create producer request. It also
> > means that if the topic eventually loads, the broker will respond to
> > the right request id with a `ProducerSuccessResponse` command.
> >
> > I will follow up with an update to the client and the protocol spec,
> > unless there are any objections.
> >
> > Thanks,
> > Michael
> >
> > On Thu, Nov 18, 2021 at 12:25 PM Neng Lu  wrote:
> > >
> > > How about making the behavior when timeout configurable? And by default, 
> > > it will send the `CloseProducer` cmd?
> > >
> > > On 2021/11/17 05:52:21 Michael Marshall wrote:
> > > > Hi All,
> > > >
> > > > I noticed that the `ServerCnxTest#testCreateProducerTimeout` test
> > > > indicates that a producer will send a `CloserProducer` command when
> > > > producer creation times out for the client.
> > > >
> > > > The Java client does not follow this protocol. When the producer
> > > > creation times out, it just schedules another attempt to create the
> > > > producer.
> > > >
> > > > The C++ client does follow this protocol and is annotated with the
> > > > following comment:
> > > >
> > > > > // Creating the producer has timed out. We need to ensure the broker 
> > > > > closes the producer
> > > > > // in case it was indeed created, otherwise it might prevent new 
> > > > > create producer operation,
> > > > > // since we are not closing the connection
> > > >
> > > > I don't see anything in our official protocol spec indicating the
> > > > "right" behavior. Given the test coverage, it seems like the initial
> > > > design was to expect a `CloseProducer` command. However, I also see that
> > > > the broker's `ServerCnx` class will reply to a redundant `Producer`
> > > > command with a `ProducerSuccess` command, as long as the producer
> > > > is already created.
> > > >
> > > > Should I submit a PR to update the Java client to send a
> > > > `CloseProducer` command when a `Producer` command times out?
> > > >
> > > > Thanks,
> > > > Michael
> > > >


Re: [DISCUSS] Pulsar Protocol For Client Timeouts and Creating Producers

2021-11-23 Thread Michael Marshall
I created a PR to update the protocol's documentation:
https://github.com/apache/pulsar/pull/12948. Please take a look, if
you're able.

Once this PR is accepted/merged, I will follow up with an update to
the Java client.

- Michael

On Thu, Nov 18, 2021 at 1:29 PM Michael Marshall  wrote:
>
> I view this as an edge case of the Pulsar Protocol that requires
> clarification. Once we clarify the spec, we can update the clients to
> conform to the spec. I don't think we need to give end users control
> over this small part of the protocol.
>
> After reviewing the design a bit more, I think we should update the
> Java client to send the `CloseProducer` command, as well as update the
> spec to recommend this implementation. While the `ServerCnx` class
> "works" both ways, the current Java client implementation leads to
> unnecessary calls, unnecessary errors, and a longer time to recovery.
>
> Specifically, if the client fails to send a `CloseProducer` command,
> it ends up getting into a sequence of retries where each new
> `Producer` command receives an immediate `ErrorResponse` because the
> `ServerCnx` already has a pending producer. By sending a
> `CloseProducer` command, the client gives the broker permission to
> stop keeping track of the original create producer request. It also
> means that if the topic eventually loads, the broker will respond to
> the right request id with a `ProducerSuccessResponse` command.
>
> I will follow up with an update to the client and the protocol spec,
> unless there are any objections.
>
> Thanks,
> Michael
>
> On Thu, Nov 18, 2021 at 12:25 PM Neng Lu  wrote:
> >
> > How about making the behavior when timeout configurable? And by default, it 
> > will send the `CloseProducer` cmd?
> >
> > On 2021/11/17 05:52:21 Michael Marshall wrote:
> > > Hi All,
> > >
> > > I noticed that the `ServerCnxTest#testCreateProducerTimeout` test
> > > indicates that a producer will send a `CloserProducer` command when
> > > producer creation times out for the client.
> > >
> > > The Java client does not follow this protocol. When the producer
> > > creation times out, it just schedules another attempt to create the
> > > producer.
> > >
> > > The C++ client does follow this protocol and is annotated with the
> > > following comment:
> > >
> > > > // Creating the producer has timed out. We need to ensure the broker 
> > > > closes the producer
> > > > // in case it was indeed created, otherwise it might prevent new create 
> > > > producer operation,
> > > > // since we are not closing the connection
> > >
> > > I don't see anything in our official protocol spec indicating the
> > > "right" behavior. Given the test coverage, it seems like the initial
> > > design was to expect a `CloseProducer` command. However, I also see that
> > > the broker's `ServerCnx` class will reply to a redundant `Producer`
> > > command with a `ProducerSuccess` command, as long as the producer
> > > is already created.
> > >
> > > Should I submit a PR to update the Java client to send a
> > > `CloseProducer` command when a `Producer` command times out?
> > >
> > > Thanks,
> > > Michael
> > >


Re: [DISCUSS] Pulsar Protocol For Client Timeouts and Creating Producers

2021-11-18 Thread Michael Marshall
I view this as an edge case of the Pulsar Protocol that requires
clarification. Once we clarify the spec, we can update the clients to
conform to the spec. I don't think we need to give end users control
over this small part of the protocol.

After reviewing the design a bit more, I think we should update the
Java client to send the `CloseProducer` command, as well as update the
spec to recommend this implementation. While the `ServerCnx` class
"works" both ways, the current Java client implementation leads to
unnecessary calls, unnecessary errors, and a longer time to recovery.

Specifically, if the client fails to send a `CloseProducer` command,
it ends up getting into a sequence of retries where each new
`Producer` command receives an immediate `ErrorResponse` because the
`ServerCnx` already has a pending producer. By sending a
`CloseProducer` command, the client gives the broker permission to
stop keeping track of the original create producer request. It also
means that if the topic eventually loads, the broker will respond to
the right request id with a `ProducerSuccessResponse` command.

I will follow up with an update to the client and the protocol spec,
unless there are any objections.

Thanks,
Michael

On Thu, Nov 18, 2021 at 12:25 PM Neng Lu  wrote:
>
> How about making the behavior when timeout configurable? And by default, it 
> will send the `CloseProducer` cmd?
>
> On 2021/11/17 05:52:21 Michael Marshall wrote:
> > Hi All,
> >
> > I noticed that the `ServerCnxTest#testCreateProducerTimeout` test
> > indicates that a producer will send a `CloserProducer` command when
> > producer creation times out for the client.
> >
> > The Java client does not follow this protocol. When the producer
> > creation times out, it just schedules another attempt to create the
> > producer.
> >
> > The C++ client does follow this protocol and is annotated with the
> > following comment:
> >
> > > // Creating the producer has timed out. We need to ensure the broker 
> > > closes the producer
> > > // in case it was indeed created, otherwise it might prevent new create 
> > > producer operation,
> > > // since we are not closing the connection
> >
> > I don't see anything in our official protocol spec indicating the
> > "right" behavior. Given the test coverage, it seems like the initial
> > design was to expect a `CloseProducer` command. However, I also see that
> > the broker's `ServerCnx` class will reply to a redundant `Producer`
> > command with a `ProducerSuccess` command, as long as the producer
> > is already created.
> >
> > Should I submit a PR to update the Java client to send a
> > `CloseProducer` command when a `Producer` command times out?
> >
> > Thanks,
> > Michael
> >


Re: [DISCUSS] Pulsar Protocol For Client Timeouts and Creating Producers

2021-11-18 Thread Neng Lu
How about making the behavior when timeout configurable? And by default, it 
will send the `CloseProducer` cmd?

On 2021/11/17 05:52:21 Michael Marshall wrote:
> Hi All,
> 
> I noticed that the `ServerCnxTest#testCreateProducerTimeout` test
> indicates that a producer will send a `CloserProducer` command when
> producer creation times out for the client.
> 
> The Java client does not follow this protocol. When the producer
> creation times out, it just schedules another attempt to create the
> producer.
> 
> The C++ client does follow this protocol and is annotated with the
> following comment:
> 
> > // Creating the producer has timed out. We need to ensure the broker closes 
> > the producer
> > // in case it was indeed created, otherwise it might prevent new create 
> > producer operation,
> > // since we are not closing the connection
> 
> I don't see anything in our official protocol spec indicating the
> "right" behavior. Given the test coverage, it seems like the initial
> design was to expect a `CloseProducer` command. However, I also see that
> the broker's `ServerCnx` class will reply to a redundant `Producer`
> command with a `ProducerSuccess` command, as long as the producer
> is already created.
> 
> Should I submit a PR to update the Java client to send a
> `CloseProducer` command when a `Producer` command times out?
> 
> Thanks,
> Michael
>