d8tltanc commented on a change in pull request #9485:
URL: https://github.com/apache/kafka/pull/9485#discussion_r533765909



##########
File path: 
jmh-benchmarks/src/main/java/org/apache/kafka/jmh/acl/AclAuthorizerBenchmark.java
##########
@@ -69,33 +70,39 @@
 @BenchmarkMode(Mode.AverageTime)
 @OutputTimeUnit(TimeUnit.MILLISECONDS)
 public class AclAuthorizerBenchmark {
-    @Param({"10000", "50000", "200000"})
+    @Param({"10000", "40000", "80000"})

Review comment:
       The underlying algorithm of AuthorizeByResourceType() implementation in 
AclAuthorizer has several characteristics:
   1. If any "allow resource" of the given ACE does not have a dominant "deny 
resource", the API will return immediately
   2. The complexity is O(n*m) where `n` is the number of "allow resources" of 
the given ACE, 'm' is the number of "deny resources" of the given ACE, but not 
related to the number of "ACE" in the cluster.
   
   $1 means that, given an ACE,  suppose `p%` of its "allow resource" does not 
have a dominant "deny resource", if `resourceCount` is `r`, on average, after 
checking `r * p * 0.01` "allow resources", the API will return. 
   a) if we are let the "dominant deny resource" distribute evenly, like use 
the (loop index % something) to determine which "allow resource" should have a 
dominant "deny resource", we end up iterating the same amount of the "allow 
resource" and returning from the API call every time, which is `r*p*0.01`
   b) if we are determine which "allow resource" should have a dominant "deny 
resource", the result will be too noisy. We may iterate only 1 resource or 
iterate all resources based on the randomize algorithm and seed.
   
   $2 means that, the API time cost is not related to the number of "ACE" but 
is hyperbolically increasing when `resourceCount` is increasing. Under the 
assumption in (1), the actual complexity would be (r * r * p * 0.01)
   
   As a result, we should get an insight into how long does the worst case 
takes, as `t`.  Then we can estimate some reasonable values of `p` and then 
estimate the API cost by `t * p`. 
   
   So I was directly testing the worst case, where p = 1, which means 100% of 
the "allow resource" will have a dominant "deny resource. The complexity hence 
would be (r^2). It's rare that a cluster can have 200k "allow resources" and 
200k corresponding "dominant deny resources" for each user, and it's not fair 
to have a relatively smaller `aclCount` and huger `resourceCount`, as the API 
is optimizing the performance by indexing on `ACE`.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to