We have had discussions in the community list on server side logic
previously. I would like to keep the specific proposal in this PIP aside,
and address what this PIP is  implicitly changing in core Pulsar design.  I
want to have an explicit discussion on that topic: what is the path for
server-side business logic in Pulsar?

Pulsar has been designed to do a few things very well.  It is designed to
be run as a hosted service, meaning it can be scaled horizontally by adding
storage or compute hardware, as traffic or tenants on the service grows. It
is optimized for data streaming at  throughput and scale,  and does
multi-tenancy extremely well.  Part of that design is that there is no
business logic that is in the data flow path. Since  business logic lives
outside of the core data flow path in Pulsar, the core is optimized for
data flow. Do plain byte movement - no ser/de, no byte copy, no
computations - and do it extremely well. Other systems, like Kafka and
Kinesis have taken the same approach;  no to server side business logic.

This particular PIP  may be  expensive on the server, or not. The next PIP
could be, and there is no rationale to stop adding any kind of business
logic into the broker, once this concept is allowed.

Selective consumers are an anti-pattern for data flow systems. There are
systems out there that support implementation of business logic in the data
flow path, and they don't scale.   Take the example of AMQ.   AMQ allows
JMS/SQL-92 expressions server side. Once the door to this anti-pattern  is
opened, there is no rhyme or reason to deny anything, upto  including a
full-blown SQL query evaluation in the dispatch path.

So why not allow that? Why not allow a full blown expression evaluation in
the data flow path?

Unfortunately there  is no way to answer this without bringing up the
conflict of interest between small users vs. large scale users running
multi-tenant hosted Pulsar, at huge traffic volumes.

For low scale, single (or few) tenant installations, efficiency of flow,
latency and throughput are not the driving concern. In a small cluster,
the implications of cost and scale, is minimal in absolute terms,  when
server side business logic is executed.

For large scale users (like me) this is a no go. There are many problems
with this,  that makes it very difficult to run a hosted platform with
predictable  SLAs, once users can introduce business logic into the broker.
These are on top of the performance and cost  implications

First, broker throughput and performance becomes unpredictable.  The
current Pulsar load model (and it is used in the load manager for load
balancing) becomes unusable. Not only that, there will be no pre-computed
model that can be used in the load manager. Since  the producer and
consumer randomly decide on what is the business logic,and the computation
can change based on the data,  the model itself becomes dynamic and the
load manager has to rebuild the model anytime an user updates the business
logic. That is a tall order, worth years of work to implement.

Second, this introduces the noisy neighbor issue. Two tenants will happily
run on the same broker, till one of them decides to change the logic on the
subscription, and suddenly the  quality for the other tenant is degraded
because the broker is impacted.  The system operator of the cluster has now
to get involved out of the blue, because one tenant did a change.
Basically  any tenant can disrupt the system by triggering additional
business logic in the server, or by specific data patterns that can make
the business logic expensive on the server

Third, this makes provisioning capacity impossible. Today Pulsar users can
be provisioned on flow - bw in/out. Msgs in/out.  With server side business
logic, there is some random overhead that needs to be accounted in the
capacity calculation.

We, who run Pulsar as a hosted service, do not want any of our tenants to
introduce server side logic into the service.  Because,  to do it well
requires a load balancer that can continuously and dynamically adjust its
load model and capacity model (based on ML on the traffic maybe).  The
scope of building such a system will convert Pulsar  from a  data streaming
project  to a load balancer/resource manager  project. The only viable
solution will be to give each tenant their own dedicated servers - at which
point all claims to multi-tenancy in Pulsar  should be dropped.


So large multi-tenant clusters will have big problems with the addition of
business logic into the broker.

But this problem - Pulsar users attempting to add server side logic into
Pulsar - is not going to go away. There will always be yet another new user
who will ask for adding ‘one more simple implementation' of server side
business logic into the broker.

My suggestion here is simple. Make the dispatcher a configurable module.
Let users who want to do server side logic configure their own
computational logic in custom dispatchers and   use it to their needs.
Allow users  to implement custom dispatchers as a loadable module.  Users
can then implement whatever logic they need to, without depending on
Pulsar, and the code and module will remain in user-land rather than Pulsar
land.  No one will be required to  contribute their dispatchers to Pulsar,
but if there are specific dispatchers which can have widespread use, they
can contribute it back into Pulsar (like connectors)

If this seems suspiciously similar to functions, then yes, it is. Functions
were meant to fulfill this need, but without messing with the dispatcher.
Functions were meant to do business logic outside the hosted service, so
that the service itself is not impacted by random users injecting business
logic into the platform.

But if functions are not acceptable, and users still want to mess with the
dispatcher, what I am proposing is a way to let users  do that without
breaking the design goals of Pulsar.  That will avoid  impacting the core
data flow path,  for large system/ hosted service/multi-tenant use cases.

So my vote is not to allow this (and any other server side logic
implementations) into the base dispatcher, but permit these kinds of
changes as configurable dispatchers. I hope I have explained the reasons
for that vote clearly.


Joe


On Mon, Nov 16, 2020 at 10:03 AM Sijie Guo <guosi...@gmail.com> wrote:

> Andre,
>
> I left a comment on the pull request. But I will just copy them here as
> well.
>
> I have a couple of comments and one suggestion.
>
> 1. What is the performance & GC implication with this change? I think most
> of the questions on this pull request is about the performance & GC
> implication. It would be good to show your benchmarking/testing methodology
> and the benchmark results to the community.
>
> 2. How are you going to handle topics with end-to-end encryption enabled?
>
> 3. How do you handle acknowledgment for the messages that have been
> filtered out and never sent to the consumers? I don't see it is discussed
> in the PIP. Especially, how is it related to different subscription types?
>
> One suggestion - If this PIP is approved, my recommendation is to use the
> NAR classloader to load the class. You can check how Pulsar uses NAR
> classloader for other interfaces.
>
> Thanks,
> Sijie
>
> On Mon, Nov 16, 2020 at 2:53 AM Kramer, Andre <andre.kra...@softwareag.com
> >
> wrote:
>
> > Sure, please feel free to copy the doc to wiki pages. It's mainly text so
> > can be converted easily.
> >
> > Cheers,
> > Andre
> >
> > -----Original Message-----
> > From: Sijie Guo <guosi...@gmail.com>
> > Sent: 13 November 2020 19:08
> > To: Dev <dev@pulsar.apache.org>
> > Subject: Re: Proposal for Consumer Filtering in Pulsar brokers
> >
> > Andre,
> >
> > Is it possible to put it in a Google Doc (or similar collaboration tool)
> > that allows other people to make comments? Also, it would be easier for
> the
> > committers to copy the PIP to Pulsar wiki pages.
> >
> > Thanks,
> > Sijie
> >
> > On Fri, Nov 13, 2020 at 2:44 AM Kramer, Andre <
> andre.kra...@softwareag.com
> > >
> > wrote:
> >
> > > Hi Sijie,
> > >
> > > I had added a PIP style document to the pull request:
> > > https://github.com/andrekramer1/pulsar/blob/consumer-filter2-7-0/PIP-X
> > > X%20-%20Consumer-filtering.pdf Hopefully that could be used to start
> > > the discussion?
> > >
> > > Regards,
> > > Andre
> > >
> > > -----Original Message-----
> > > From: Sijie Guo <guosi...@gmail.com>
> > > Sent: 12 November 2020 18:32
> > > To: Dev <dev@pulsar.apache.org>
> > > Subject: Re: Proposal for Consumer Filtering in Pulsar brokers
> > >
> > > Hi Andre,
> > >
> > > I didn't see the attached writeup. Can you write a PIP for this
> feature?
> > > Given it is a big feature, it would be good to discuss it through a
> PIP.
> > >
> > > - Sijie
> > >
> > > On Thu, Nov 12, 2020 at 6:17 AM Kramer, Andre
> > > <andre.kra...@softwareag.com
> > > >
> > > wrote:
> > >
> > > > Hello everyone,
> > > >
> > > >
> > > >
> > > > We at Software AG have prototyped adding filtering on Consumer
> > > > subscriptions in the Pulsar broker and are submitting our changes
> > > > for consideration under Apache 2.0 license. Please see pull request
> > > > [Consumer Filtering #8544
> > > > https://github.com/apache/pulsar/pull/8544]
> > > > and attached write up. Comments welcome!
> > > >
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > Andre
> > > >
> > > >
> > > >
> > > > andre.kra...@softwareag.com
> > > > This communication contains information which is confidential and
> > > > may also be privileged. It is for the exclusive use of the intended
> > > > recipient(s). If you are not the intended recipient(s), please note
> > > > that any distribution, copying, or use of this communication or the
> > > > information in it, is strictly prohibited. If you have received this
> > > > communication in error please notify us by e-mail and then delete
> > > > the
> > > e-mail and any copies of it.
> > > > Software AG (UK) Limited Registered in England & Wales 1310740 -
> > > > *http://www.softwareag.com/uk
> > > > * <http://www.softwareag.com/uk>
> > > >
> > > This communication contains information which is confidential and may
> > > also be privileged. It is for the exclusive use of the intended
> > > recipient(s). If you are not the intended recipient(s), please note
> > > that any distribution, copying, or use of this communication or the
> > > information in it, is strictly prohibited. If you have received this
> > > communication in error please notify us by e-mail and then delete the
> > e-mail and any copies of it.
> > > Software AG (UK) Limited Registered in England & Wales 1310740 -
> > > http://www.softwareag.com/uk
> > >
> > This communication contains information which is confidential and may
> also
> > be privileged. It is for the exclusive use of the intended recipient(s).
> If
> > you are not the intended recipient(s), please note that any distribution,
> > copying, or use of this communication or the information in it, is
> strictly
> > prohibited. If you have received this communication in error please
> notify
> > us by e-mail and then delete the e-mail and any copies of it.
> > Software AG (UK) Limited Registered in England & Wales 1310740 -
> > http://www.softwareag.com/uk
> >
>

Reply via email to