Hey folks, A post <https://apachebookkeeper.slack.com/archives/C6G5104SF/p1651682310836059> on the community slack channel states that there is a -20% perf degradation in the BK client read/write path. Although the community is currently still looking into it there are patterns I see which we have seen in the past with our fork. BK has been chill in the past with Pulsar having most of the activity but when BK hits an issue like this, it puts everything at risk, so I think we should have better quality gates for BK than the ones that currently exist.
*On the topic of perf degradation:* It's worth checking if the netty upgrade has any effect on this. In the past we've seen perf degradation in netty for mTLS connections on different JDKs. So much so I had started an issue on the netty github <https://github.com/netty/netty/issues/11268> about this which was accepted to be an issue by the netty maintainer. Maybe that is something worth looking into as it has been pointed out that there have been around ~270 commits in between the versions with the degradation. *Somethings worth looking into so as to not let this happen again (community discussion):* *1. Upgrading dependencies marked to have vulns (without perf testing / without figuring out if it really even applies to the way we use the dep).* When it comes to upgrading dependencies, as a community we need to decide on if we want to upgrade *everytime*. Most of the times, the CVEs are in some other part of the code of the dependency that we are using for a totally different purpose. But by upgrading we end up causing more pain than benefit. (via some interfaces getting deprecated, interplay between dependencies with different version etc) Something that we do internally is define which dependencies are in the critical path vs which are not. (netty, JDKs etc are in the critical path) vs say a third order dependency for vertx server. So we allow direct upgrades of non-critical dependencies. For dependencies in the critical path, we have to prove the following: - The CVE actually affects us. (In the way we are using the dependency or in some cases even if we don't use it) - Perf testing results and diff for the critical path to prove there is no perf degradation on upgrading. *[Community action item]* -> If the community agrees on this model, we can decide on what dependencies are 'critical' and what are not and also follow the same model. *2. Defining SLOs/SLAs* We have that < 5 ms read/write goal somewhere but I don't think it's front and center. While many of BK users (companies) set their own goals in their own forks, I think it would be a good idea for us as a community to define and codify certain basic SLAs in terms of client / bookie read/write throughput. Once defined we can move on to the next step. *[Community action item]* -> Do you folks think it is worth it to define these so we can design perf workloads around it? If so we can start discussions on this. *3. Routine perf testing* IIRC we have the bookkeeper benchmark but not sure if it's enough for perf testing and checking if each release adhere's to the above defined SLAs.. Any ideas on this? I'm wondering if we can repurpose some benchmark that can be generic enough @Venkateswara Rao Jujjuri <[email protected]> ? Maybe there is something on the Pulsar side which we could reuse? After which we should see if this can be a part of our CI/CD GitHub Actions. *[Community action item]* -> Can come to this when we resolve the earlier discussion. Regards, Anup -- Anup Ghatage www.ghatage.com
