Hey folks,

A post
<https://apachebookkeeper.slack.com/archives/C6G5104SF/p1651682310836059>
on the community slack channel states that there is a -20% perf degradation
in the BK client read/write path. Although the community is currently still
looking into it there are patterns I see which we have seen in the past
with our fork.
BK has been chill in the past with Pulsar having most of the activity but
when BK hits an issue like this, it puts everything at risk, so I think we
should have better quality gates for BK than the ones that currently exist.

*On the topic of perf degradation:*
It's worth checking if the netty upgrade has any effect on this. In the
past we've seen perf degradation in netty for mTLS connections on different
JDKs.
So much so I had started an issue on the netty github
<https://github.com/netty/netty/issues/11268> about this which was accepted
to be an issue by the netty maintainer.
Maybe that is something worth looking into as it has been pointed out that
there have been around ~270 commits in between the versions with the
degradation.

*Somethings worth looking into so as to not let this happen again
(community discussion):*

*1. Upgrading dependencies marked to have vulns (without perf testing /
without figuring out if it really even applies to the way we use the dep).*
When it comes to upgrading dependencies, as a community we need to decide
on if we want to upgrade *everytime*.
Most of the times, the CVEs are in some other part of the code of the
dependency that we are using for a totally different purpose.
But by upgrading we end up causing more pain than benefit. (via some
interfaces getting deprecated, interplay between dependencies with
different version etc)

Something that we do internally is define which dependencies are in the
critical path vs which are not. (netty, JDKs etc are in the critical path)
vs say a third order dependency for vertx server.
So we allow direct upgrades of non-critical dependencies.
For dependencies in the critical path, we have to prove the following:
- The CVE actually affects us. (In the way we are using the dependency or
in some cases even if we don't use it)
- Perf testing results and diff for the critical path to prove there is no
perf degradation on upgrading.

*[Community action item]*
-> If the community agrees on this model, we can decide on what
dependencies are 'critical' and what are not and also follow the same model.

*2. Defining SLOs/SLAs*
We have that < 5 ms read/write goal somewhere but I don't think it's front
and center.
While many of BK users (companies) set their own goals in their own forks,
I think it would be a good idea for us as a community to define and codify
certain basic SLAs in terms of client / bookie read/write throughput.
Once defined we can move on to the next step.

*[Community action item]*
-> Do you folks think it is worth it to define these so we can design perf
workloads around it? If so we can start discussions on this.

*3. Routine perf testing*
IIRC we have the bookkeeper benchmark but not sure if it's enough for perf
testing and checking if each release adhere's to the above defined SLAs..
Any ideas on this?
I'm wondering if we can repurpose some benchmark that can be generic
enough @Venkateswara
Rao Jujjuri <jujj...@gmail.com> ?
Maybe there is something on the Pulsar side which we could reuse? After
which we should see if this can be a part of our CI/CD GitHub Actions.

*[Community action item]*
-> Can come to this when we resolve the earlier discussion.

Regards,
Anup
-- 
Anup Ghatage
www.ghatage.com

Reply via email to