>> If so, I think it could be a good reason for introducing binary
protocol support here. For the security sensitive users like financial
application. Usually they will try to reduce the dependencies (less
dependencies, less
CVEs and the exposed service endpoints. For example, the flink connector
also have pulsar-admin dependency but some of the users want to remove it.

Yes, that could be another usecase to have such API available for users.
So, I would like to bump up this discussion again and see if we have any
other suggestions or concerns as we have multiple users who need it and
would like to move forward with this API to serve those usecases.

Thanks,
Rajan

On Mon, Jun 26, 2023 at 9:03 PM PengHui Li <peng...@apache.org> wrote:

> Hi Rajan,
>
> Thanks for the explanation
>
> > This feature helps them to avoid multiple different extra
> efforts
>
> If I understand correctly. You want to say users don't want to
> add the pulsar-admin dependency or the the cluster don't want
> to expose the REST API to external (not the cluster admin or
> tenant admin)?
>
> If so, I think it could be a good reason for introducing binary
> protocol support here. For the security sensitive users like financial
> application.
>
> Usually they will try to reduce the dependencies (less dependencies, less
> CVEs)
> and the exposed service endpoints. For example, the flink connector
> also have pulsar-admin dependency but some of the users want to
> remove it.
>
> I want to say that from the perspective of improving performance,
> it may not be more convincing than the above reason.
>
> Thanks,
> Penghui
>
>
> On Mon, Jun 26, 2023 at 3:37 PM Rajan Dhabalia <rdhaba...@apache.org>
> wrote:
>
> > > I do not deny that binary protocol has performance advantages. But
> maybe
> > the bottleneck is not the protocol level for now.
> >
> > Well, sure. We had serious issue with performance in past over http but
> > this feature we would mainly like to introduce it now for the
> applications
> > to enhance api user accessibility experience where we have these multiple
> > usecases where applications with large number of topics and high fanout
> > consumers would like to fetch stats and stats-internal to retrieve
> various
> > metadata for application startup and managing application state based on
> > managed-ledge states. You can think of producer/consumer stats api which
> is
> > used by many usecases in different scenarios of monitoring or state
> > management. This feature helps them to avoid multiple different extra
> > efforts and performance considerations, which helps to give clean and
> easy
> > experience for their application.
> > I am open to hear about alternative of json string and response schema
> > definition but keeping similar response-schema as admin-api response for
> > stats/stats-internal helps to give consistent view of stats across all
> APIs
> > and transferring json format helps to skip transformation stats
> definition
> > and we don't have to make any wire protocol changes whenever we add any
> new
> > field or state in response which makes this protocol response agnostic
> and
> > do not require maintenance for any changes in stats. So, we can consider
> > any suggestion but we have multiple usecaes which need this API and we
> > should implement it in a way where it can provide a consistent definition
> > view across other APIs without maintenance efforts while making any stat
> > response changes.
> >
> > Thanks,
> > Rajan
> >
> > On Sat, Jun 24, 2023 at 5:51 PM PengHui Li <peng...@apache.org> wrote:
> >
> > > > >> Main reasons are performance (you will see improvement due to http
> > > overhead compare to tcp, limited threads on http-server side and
> > definitely
> > > performance compare to netty-tcp vs http-jetty server), user
> feasibility
> > > (same reason why we have producer/consumer stats using binary protocol
> > > where user can get the stats from the same producer/consumer entity
> > without
> > > having additional client creation for the stats), scale (allowing large
> > > number of requests over binary protocol compare to http is more
> > scalable).
> > >
> > > I do not deny that binary protocol has performance advantages.
> > > But maybe the bottleneck is not the protocol level for now.
> > >
> > > IMO, we should have a clear performance expectation for getting the
> topic
> > > stats
> > > If you want to achieve 10k of requests per second.
> > > Maybe the protocol is not the bottleneck. Of course, if you want to
> reach
> > > 100k or
> > > even millions of requests on a single server per second.
> > > Exactly, the binary protocol will help. I'm curious whether it is a
> real
> > > case.
> > > Unless you have to query topic stats every second and the broker owned
> > > 100k topics.
> > >
> > > And I also saw some benchmark results for the HTTP servers.
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/HTTPCOMPONENTS/HttpCoreBenchmark
> > > It's not refreshed, but it's instructive. Try other HTTP servers should
> > > also be considered.
> > >
> > > The binary protocol for topic stats is an option. But it also has
> > > disadvantages and negative effects.
> > >
> > > - Make the binary protocol more complicated
> > > - All the clients need to catch up with the new protocol
> > >
> > > > >> SNI is layer-4 tcp layer protocol and does not support over the
> > HTTP.
> > >
> > > But you can just have an HTTP proxy to handle the redirect?
> > > Why SNI is required for getting topic stats?
> > > I'm not sure about the whole story here.
> > >
> > > > Well JSON would be the better option to avoid data conversion and
> copy
> > > for
> > > complex stats/internal-stats complex data-structures at broker and
> client
> > > side, maintaining consistency between stats/internal-stats pojo-schema,
> > and
> > > accessing stats with consistent pojo definition.
> > >
> > > This is from the perspective of developers who are already very
> familiar
> > > with Pulsar.
> > > If you don't know it, just want to create a client based on Pulsar's
> API.
> > > How to do it? For public API, I think a clear definition should be the
> > > first consideration.
> > >
> > > Thanks,
> > > Penghui
> > >
> > >
> > > On Thu, Jun 22, 2023 at 9:13 AM Rajan Dhabalia <rdhaba...@apache.org>
> > > wrote:
> > >
> > > > Please find the response inline.
> > > >
> > > > On Wed, Jun 21, 2023 at 5:53 PM PengHui Li <peng...@apache.org>
> wrote:
> > > >
> > > > > > However, stats retrieval over HTTP API doesn’t work well in use
> > cases
> > > > > when users would like to access this API at a higher scale when a
> > large
> > > > > number of application nodes would like to use it over HTTP which
> > could
> > > > > overload brokers and sometimes makes broker irresponsive and impact
> > > admin
> > > > > API performance. It’s also become difficult when Pulsar is deployed
> > in
> > > > the
> > > > > cloud behind the SNI proxy and applications also want to access
> > > > large-scale
> > > > > stats information periodically over different HTTP port instead it
> > > would
> > > > be
> > > > > better if applications can fetch stats over on same binary protocol
> > for
> > > > > scalability and accessibility reasons.
> > > > >
> > > > > From the motivation of the proposal, resolving the performance
> issue
> > > with
> > > > > the REST API
> > > > > is the goal. Sorry, I haven't run a benchmark for the REST API,
> have
> > > you
> > > > > tested it or could
> > > > > you please figure out where is the bottleneck with the REST API
> > > solution?
> > > > > And if users use
> > > > > the new approach, what is the performance expectation here, get a
> 30%
> > > or
> > > > > 50% improvement?
> > > > >
> > > > >> Main reasons are performance (you will see improvement due to http
> > > > overhead compare to tcp, limited threads on http-server side and
> > > definitely
> > > > performance compare to netty-tcp vs http-jetty server), user
> > feasibility
> > > > (same reason why we have producer/consumer stats using binary
> protocol
> > > > where user can get the stats from the same producer/consumer entity
> > > without
> > > > having additional client creation for the stats), scale (allowing
> large
> > > > number of requests over binary protocol compare to http is more
> > > scalable).
> > > >
> > > >
> > > > > And I'm not sure why it is difficult to fetch the stats behind the
> > SNI
> > > > > proxy, could you please explain
> > > > > more? It will be helpful for the reviewers to understand the issue
> we
> > > > want
> > > > > to resolve.
> > > > >
> > > > >> SNI is layer-4 tcp layer protocol and does not support over the
> > http.
> > > >
> > > >
> > > > > For the client side API changes. You have InternalStatsOption as a
> > > param
> > > > of
> > > > > the method, but
> > > > > I don't see any definition for it. It should be enum? And why not
> > have
> > > a
> > > > > pojo for it? It will be
> > > > > more easier for users to know what should be set and what should
> not.
> > > > >
> > > > I have added that param map after our last discussion which will be
> > enum
> > > > to pass additional flags.
> > > >
> > > >
> > > > > For the response data. The JSON string is not good for
> compatibility.
> > > > And I
> > > > > saw the discussion
> > > > > in the DISCUSSION thread from you and Enrico. But it's not a
> > guaranteed
> > > > > solution from the API's
> > > > > perspective. And for any other clients, cpp, go, rust, they must
> > build
> > > > > their own pojo based on
> > > > > the REST API.
> > > > >
> > > > Well JSON would be the better option to avoid data conversion and
> copy
> > > for
> > > > complex stats/internal-stats complex data-structures at broker and
> > client
> > > > side, maintaining consistency between stats/internal-stats
> pojo-schema,
> > > and
> > > > accessing stats with consistent pojo definition.
> > > >
> > > >
> > > > > Thanks,
> > > > > Penghui
> > > > >
> > > > > On Tue, Jun 20, 2023 at 3:06 PM Rajan Dhabalia <
> rdhaba...@apache.org
> > >
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I would like to start VOTE for :
> > > > > > https://github.com/apache/pulsar/issues/20265
> > > > > >
> > > > > > Thanks,
> > > > > > Rajan
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to