RE: Re: Re: [Discuss] PIP-319 Unblock stuck Key_Shared subscription after consumer reconnect

2023-12-03 Thread Nikolai

Hi Rajan,


Thanks for sharing the link! I will follow the PR.


Best Regards,

Nikolai


On 2023/12/02 01:21:04 Rajan Dhabalia wrote:
> Hi Nikolai,
>
> I completely understand your concern and pain. But let's understand
> ordering semantics for exclusive/fail-over subscription. This 
subscription

> makes sure to dispatch new messages which the broker is reading using
> readPosition must be in order but that doesn't mean the broker will never
> handle redelivery of unack message and broker will not stop 
dispatching by

> considering the end of the world. However, key-Shared sub thinks it's the
> end of the world. key-shared sub extends the shared sub behavior with
> additional speciality of consumer dispatch affinity based on message key.
> Broker tries its best to perform this responsibility but it's still not
> guaranteed as it will be really costly to give such guarantee and if one
> requires it then we can use a failover sub. In shared or even in other
> subscriptions, brokers don't restrict or stop dispatching of redelivery
> messages.
> However, in key-shared sub, broker stops forever dispatching new and
> redelivery messages in certain conditions and additionally broker 
goes into

> an infinite iterative sequence of reading duplicate cold messages and
> discarding them which causes IO and CPU impact both on broker and bookie.
> Pulsar must not have such broken and incorrect semantic behavior in the
> system. I have created an issue:
> https://github.com/apache/pulsar/issues/21656 which has a simple unit 
test

> case to reproduce this behavior and that's what is happening in multiple
> production Pulsar systems and users are not happy with it.
>
> Therefore, we should not stop dispatching and restricting redeliver
> messages which will automatically address almost all stuck issues and the
> broker can still follow the ordering guarantee for new messages which it
> promises for other subs as well.
>
> Thanks,
> Rajan
>
> On Fri, Dec 1, 2023 at 12:03 AM Nikolai  wrote:
>
> > Hi Rajan,
> >
> > Thanks for the review!
> >
> > It looks like the root cause of "stuck" keyword is the nature of
> > Key_Shared subscription which must guarantee messages ordering. I
> > understand it is really hard to achieve such guarantees without having
> > separate physical partitions but this feature is very essential for 
many

> > use cases. I also understand that the solution I've provided is more a
> > hack than a solid solution. I was trying to touch as little code as I
> > can and, unfortunately, I see no other options to avoid stuck 
dispatches

> > in the current implementation approach.
> >
> > I also want to highlight that the current implementation leads entire
> > system to hang due to a single message corruption in case consumers
> > disconnects. This breaks one of a big Pulsar advantage: independent 
keys

> > processing.
> >
> > Do you have plans for Key_Shared dispatcher rework? I would be happy to
> > contribute to such a project if I can help somehow.
> >
> >
> > Best Regards,
> >
> > Nikolai
> >
> >
> >
> > On 2023/12/01 06:34:59 Rajan Dhabalia wrote:
> > > Hi Nikolai,
> > >
> > > Thanks for bringing this to the attention. I have seen multiple PIP
> > and bug
> > > fixes for the Key-Shared subscription and I feel Key_Shared 
subscription

> > > dispatcher is developed with multiple fundamental issues which
> > required to
> > > introduce scheduled-task to unblock stuck reads, introducing stuck
> > delivery
> > > flag, and now we are seeing at least 2-3 PIP to add few more hacks to
> > > control such stuck dispatch. After reviewing code, it seems
> > >
> >
> > 
PersistentStickyKeyDispatcherMultipleConsumers::getRestrictedMaxEntriesForConsumer(..)

> > > function can still return 0 dispatch messages with multiple different
> > edge
> > > cases and fundamental dispatch is not clean at all. This is the only
> > > dispatcher introduced after open sourcing Pulsar and unfortunately it
> > was
> > > not developed with a cleaner approach. I understand the pain of such
> > stuck
> > > dispatch and even the current release is having a bugs where 
dispatch is
> > > getting stuck and many production systems are struggling with it 
and you

> > > must be suggesting this PIP to get work around for this stuck issue.
> > But I
> > > don't think this is the right approach to follow.
> > > I think the correct approach is to get rid of this "stuck" keyword
> > from the
> > > dispatcher becau

RE: Re: [Discuss] PIP-319 Unblock stuck Key_Shared subscription after consumer reconnect

2023-12-01 Thread Nikolai

Hi Rajan,

Thanks for the review!

It looks like the root cause of "stuck" keyword is the nature of 
Key_Shared subscription which must guarantee messages ordering. I 
understand it is really hard to achieve such guarantees without having 
separate physical partitions but this feature is very essential for many 
use cases. I also understand that the solution I've provided is more a 
hack than a solid solution. I was trying to touch as little code as I 
can and, unfortunately, I see no other options to avoid stuck dispatches 
in the current implementation approach.


I also want to highlight that the current implementation leads entire 
system to hang due to a single message corruption in case consumers 
disconnects. This breaks one of a big Pulsar advantage: independent keys 
processing.


Do you have plans for Key_Shared dispatcher rework? I would be happy to 
contribute to such a project if I can help somehow.



Best Regards,

Nikolai



On 2023/12/01 06:34:59 Rajan Dhabalia wrote:
> Hi Nikolai,
>
> Thanks for bringing this to the attention. I have seen multiple PIP 
and bug

> fixes for the Key-Shared subscription and I feel Key_Shared subscription
> dispatcher is developed with multiple fundamental issues which 
required to
> introduce scheduled-task to unblock stuck reads, introducing stuck 
delivery

> flag, and now we are seeing at least 2-3 PIP to add few more hacks to
> control such stuck dispatch. After reviewing code, it seems
> 
PersistentStickyKeyDispatcherMultipleConsumers::getRestrictedMaxEntriesForConsumer(..)
> function can still return 0 dispatch messages with multiple different 
edge

> cases and fundamental dispatch is not clean at all. This is the only
> dispatcher introduced after open sourcing Pulsar and unfortunately it was
> not developed with a cleaner approach. I understand the pain of such 
stuck

> dispatch and even the current release is having a bugs where dispatch is
> getting stuck and many production systems are struggling with it and you
> must be suggesting this PIP to get work around for this stuck issue. 
But I

> don't think this is the right approach to follow.
> I think the correct approach is to get rid of this "stuck" keyword 
from the

> dispatcher because having "stuck" concept in dispatcher means it has a
> known flow and bug which we are not solving but we are adding more 
debt to
> perform the work around. We never had any such concept in other 
dispatchers

> which was introduced when Pulsar was introduced originally and it just
> works with clear semantics and permit flow mechanism , and we should 
follow

> the same practice for this subscription.
>
> Thanks,
> Rajan
>
> On Thu, Nov 30, 2023 at 9:18 AM Nikolai  wrote:
>
> > Hello!
> >
> > I submitted a new PIP to add a configuration option which allows to 
skip

> > blocking recently joined consumers for Key_Shared subscriptions
> >
> > It introduces additional memory consumption, so it will be disabled by
> > default. It would fix issues like
> > https://github.com/apache/pulsar/issues/21199
> >
> > Link to the PIP: https://github.com/apache/pulsar/pull/21615
> >
> > PR with the PIP implementation:
> > https://github.com/apache/pulsar/pull/21579
> >
> >
> > Best Regards,
> >
> > Nikolai
> >
> >
>


[Discuss] PIP-319 Unblock stuck Key_Shared subscription after consumer reconnect

2023-11-30 Thread Nikolai

Hello!

I submitted a new PIP to add a configuration option which allows to skip 
blocking recently joined consumers for Key_Shared subscriptions


It introduces additional memory consumption, so it will be disabled by 
default. It would fix issues like 
https://github.com/apache/pulsar/issues/21199


Link to the PIP: https://github.com/apache/pulsar/pull/21615

PR with the PIP implementation: https://github.com/apache/pulsar/pull/21579


Best Regards,

Nikolai