Hi all, Any other suggestions or objections to the proposal?
Thanks, Murali On Thu, May 23, 2024 at 10:18 AM Muralidhar Basani < muralidhar.bas...@aiven.io> wrote: > Thanks Greg. > I have updated KIP with details on optimisation of LITERAL too. > > Regarding limitations in performance by introducing MATCH is definitely a > question. > - By optimizing LITERAL and PREFIX we are gaining a few nano secs I think. > (described in kip) > - MATCH can be introduced only with a configuration. So by default, we can > disable the MATCH check (ex : acls.pattern.match.enable=false), and if > customer enables the config, only then add the lookup in the matchingAcls() > - With the proposal already described in KIP for MATCH, we may not have > any limitations., rather will be efficient. > > Maybe we are in good shape with these propositions > > Thanks, > Murali > > > > On Tue, May 21, 2024 at 6:15 PM Greg Harris <greg.har...@aiven.io.invalid> > wrote: > >> Hi Murali, >> >> I don't have a trie library in mind. I looked at the current >> implementation of the StandardAuthorizer and found that we are already >> benefiting from the prefix structure in the implementation [1]. The >> current implementation appears to be a TreePSet [2]. >> >> Now, we've already made this tradeoff once with PREFIX: Prefixes are >> less structured than literals, because with literals you can use a >> hashing algorithm to jump directly to your relevant ACLs in O(1), but >> with a prefix you either need to do multiple lookups, or some sort of >> O(log(n)) lookup. And we determined that the ultimate limitation in >> performance was worth it for the expressiveness. >> We're making this tradeoff again with MATCH acls: wildcards are less >> structured than prefixes or literals, because of the reasons I >> mentioned earlier. We need to judge now if the ultimate limitation in >> performance is worth it. >> >> I think your strategy for using the optimized layout for prefix and >> literal matches is smart, because there does seem to be a gap in >> performance possible. It makes me wonder why the optimized layout for >> literals was not used when prefixes were added. Literal lookups still >> go through the tree lookup, when they could be moved to a hash-lookup >> instead. >> That would allow users to "choose" for themselves on a convenience vs >> performance scale: Smaller use-cases can add a single convenient >> MATCH, and larger use-cases can add the multiple optimized PREFIXes. >> >> [1] >> https://github.com/apache/kafka/blob/9fe3932e5c110443f7fa545fcf0b8f78574f2f73/metadata/src/main/java/org/apache/kafka/metadata/authorizer/StandardAuthorizerData.java#L319-L339 >> [2] >> https://github.com/apache/kafka/blob/9fe3932e5c110443f7fa545fcf0b8f78574f2f73/server-common/src/main/java/org/apache/kafka/server/immutable/pcollections/PCollectionsImmutableNavigableSet.java#L34 >> >> Thanks, >> Greg >> >> On Tue, May 21, 2024 at 8:23 AM Muralidhar Basani >> <muralidhar.bas...@aiven.io.invalid> wrote: >> > >> > @greg which library of trie you were thinking of ? There is one in the >> > commons-collection I see. >> > >> > On Fri, May 17, 2024 at 3:55 PM Claude Warren <cla...@xenei.com> wrote: >> > >> > > > >> > > > This has implications for execution complexity: If we can't compute >> > > > whether two patterns overlap, then we need to run both of them on >> each >> > > > piece of input to test if they both match. Under the current >> > > > LITERAL/PREFIX system, we can optimize execution with a trie, but >> that >> > > > option wouldn't be available to us with MATCH. >> > > > >> > > >> > > If we consider the case of an asterisk representing 1 or more >> characters >> > > then determining if 2 patterns overlap can be fairly simple and very >> fast. >> > > Let's call the text from the ACL the target, and the text from the >> wildcard >> > > matcher the candidate. >> > > >> > > When a wildcard pattern, excluding '*', is created: >> > > >> > > - the candidate text is broken into fragments separated by >> characters >> > > that are not Character.isLetterOrDigit() (See >> > > https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html >> ). >> > > - fragments that contain 1 character are ignored. >> > > - fragments that contains 2 or more characters are scanned and >> every >> > > every pair of characters used to create a Hasher (See >> > > >> > > >> https://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/bloomfilter/Hasher.html >> > > ) >> > > that hasher is added to a Bloom filter associated with the wildcard >> > > pattern. >> > > >> > > When a target is being evaluated and matching wildcard entries are to >> be >> > > located. Split and create a bloom filter using entry and the same >> strategy >> > > as for the wildcard patterns above. These bloom filters will have >> had more >> > > pairs of characters added than for a matching wildcard pattern. >> > > >> > > Each filter contains the pattern for the unique pairs in the >> fragments of >> > > the original text: >> > > >> > > Now we can check the Bloom filters. >> > > >> > > To find potential matching patterns we can check to see if the Bloom >> filter >> > > for the pattern is contained within the bloom filter for the entry >> text. >> > > If so then we know that it is likely that the pairs of characters >> specified >> > > in the wild card (non-wildcard text) appear in the entry text. >> > > >> > > At this point we can evaluate the reduced number of patterns to see >> which >> > > ones actually match. >> > > >> > > Having reduced the candidates to the matching patterns we can then >> use a >> > > standard measure of similarity like the Levenshtein distance to >> determine >> > > which candidates require the fewest edits to evolve into the target. >> The >> > > one with the fewest edits is the most specific and should be applied. >> > > >> > > Now if you want to know which patterns overlap you have a similar >> process. >> > > >> > > Each Bloom filter can calculate the estimated number of unique >> fragments >> > > that were added to it. If the filters are sorted by this estimation, >> the >> > > ones with the highest counts may contain the ones with the lowest >> counts, >> > > but a lower count can never contain a higher count. The calculation >> to >> > > perform the estimation can be skipped and the cardinality value of >> each >> > > Bloom filter can be used instead. You can then check the smaller >> filters >> > > against the larger ones and find where one candidate is contained >> within >> > > another. >> > > >> > > If you want to know if they intersect the BloomFilter from >> > > commons-collections has an estimateIntersection as well as an >> estimateUnion >> > > method. Quick tests can be made between filters to see if there is >> any >> > > overlap before more complex analysis is performed. >> > > >> > > The solution here does not make the final comparisons easier, it >> simply >> > > reduces the search space to find the items that need to be compared. >> > > >> > > >> > > On Mon, May 6, 2024 at 9:03 PM Greg Harris >> <greg.har...@aiven.io.invalid> >> > > wrote: >> > > >> > > > Hi Murali, >> > > > >> > > > Thanks for the KIP! >> > > > >> > > > I think I understand the motivation for this KIP in situations where >> > > > there are a "cross product" of topics for two or more variables X >> and >> > > > Y, and want to write ACLs for each of the variable axes. >> > > > If you format your topics "X-Y-suffix", it's not easy to write rules >> > > > that apply to all "Y" topics, because you need to enumerate all of >> the >> > > > "X" values, and the problem persists even if you reorder the topic >> > > > name. >> > > > >> > > > In my recent work on KIP-986 I found it necessary to introduce >> > > > "namespaces" to group topics together, and I was going to replicate >> > > > the ACL system to specify those namespaces. This change to the ACL >> > > > system could increase the expressiveness and complexity of that >> > > > feature, if it is ever implemented. >> > > > One of the primitives I needed when specifying namespaces was the >> > > > ability to tell when two namespaces overlapped (i.e. does there >> exist >> > > > any topic which is present in both namespaces). This is trivial to >> do >> > > > with the current PREFIX and LITERAL system, as we can find the >> > > > maximum-length common prefix with just some length comparisons and >> > > > equality checks. >> > > > I considered specifying namespaces via regular expressions, and >> found >> > > > that it was computationally much more difficult. Computing the >> > > > intersection of two regexes appears to be exponential in the length >> of >> > > > the regexes, leading me to avoid adding it. >> > > > >> > > > I understand that you're not suggesting full REGEX support, and that >> > > > "namespaces" don't need to support MATCH, but I think MATCH may run >> > > > into related difficulties. Any MATCH can overlap with any other >> MATCH >> > > > or PREFIX if it includes a sufficient number of wildcards. For >> > > > example: >> > > > MATCH *-accounts-* has overlap with PREFIX nl as they can both match >> > > > "nl-accounts-localtopic", but that isn't sensitive to the contents >> > > > "nl", it is true for any PREFIX. >> > > > MATCH *-accounts-* has overlap with MATCH *localtopic, as they can >> > > > both match "nl-accounts-localtopic", but that isn't actually >> sensitive >> > > > to the contents "localtopic", it's true for any MATCH which >> includes a >> > > > wildcard at the beginning. >> > > > >> > > > This has implications for execution complexity: If we can't compute >> > > > whether two patterns overlap, then we need to run both of them on >> each >> > > > piece of input to test if they both match. Under the current >> > > > LITERAL/PREFIX system, we can optimize execution with a trie, but >> that >> > > > option wouldn't be available to us with MATCH. >> > > > >> > > > The current system makes users evaluate a trade-off: >> > > > 1. Optimize the number of ACLs by organizing topics according to >> > > > prefixes (for example, "accounts-localtopic-nl" and PREFIX >> "accounts", >> > > > PREFIX "accounts-localtopic") >> > > > 2. Use less-structured topic names, with a corresponding ACL scheme >> > > > that has more individual rules. >> > > > The system currently informs users of this tradeoff by making them >> > > > write multiple ACLs, and making them think "there has got to be a >> > > > better way!". Perhaps we can find a better way to surface this best >> > > > practice, or better inform users about it. >> > > > >> > > > I understand that there are going to be situations more complex than >> > > > your example, where multiple individual rules will always be >> necessary >> > > > with only PREFIX evaluation. I think even in those situations, a >> > > > number of efficient-to-evaluate rules is preferable to just one >> > > > expensive-to-evaluate rule. >> > > > >> > > > One alternative that I thought of could be "PARAMETERIZED" ACLs >> which >> > > > are like PREFIXED, but allow some parameter substitution. For >> example >> > > > PARAMETERIZED "(nl|de|cz)-accounts-". I'm lifting regex syntax here, >> > > > but this isn't actually a regex, and wouldn't allow arbitrary >> numbers >> > > > of characters, or the * or + operators. >> > > > In the background it could evaluate exactly like the 3 individual >> > > > PREFIX rules, but be easier to evaluate on the backend, and support >> > > > the intersection query I mentioned earlier. It could also support >> > > > [a-zA-Z] notation in case the parameter values aren't known ahead of >> > > > time, but have a fixed length. >> > > > >> > > > Thanks, >> > > > Greg >> > > > >> > > > On Mon, May 6, 2024 at 11:17 AM Claude Warren <cla...@xenei.com> >> wrote: >> > > > > >> > > > > I have an idea for how to reduce the time for ACL lookups in >> general >> > > and >> > > > > particularly where wildcards are involved using sequence >> > > > > characterization techniques from bioinformatics. But I need a >> set of >> > > ACL >> > > > > patterns and associated topics to test with. >> > > > > >> > > > > On Fri, May 3, 2024 at 2:45 PM Haruki Okada <ocadar...@gmail.com> >> > > wrote: >> > > > > >> > > > > > Hi, Murali. >> > > > > > >> > > > > > First, could you add the KIP-1042 to the index ( >> > > > > > >> > > > > > >> > > > >> > > >> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals >> > > > > > ) >> > > > > > as well so that everyone can find it easily? >> > > > > > >> > > > > > I took a look at the KIP, then I have 2 questions: >> > > > > > >> > > > > > 1. Though the new MATCH resource pattern type may reduce the >> effort >> > > of >> > > > > > adding ACLs in some cases, do you have any concrete use case >> you are >> > > in >> > > > > > mind? (When prefixed ACL was introduced in KIP-290, there was a >> > > > use-case >> > > > > > that using it for implementing multi-tenancy) >> > > > > > >> > > > > > 2. As you may know, ACL lookup is in the hot-path which the >> > > > performance is >> > > > > > very important. ( >> > > > > > >> > > > > > >> > > > >> > > >> https://github.com/apache/kafka/blob/240243b91d69c2b65b5e456065fdcce90c121b04/core/src/main/scala/kafka/security/authorizer/AclAuthorizer.scala#L539 >> > > > > > ). >> > > > > > Do you have ideas how do we update `matchingAcls` to support >> > > > MATCH-type ACL >> > > > > > without introducing performance issue? >> > > > > > >> > > > > > >> > > > > > Thanks, >> > > > > > >> > > > > > 2024年5月3日(金) 19:51 Claude Warren, Jr <claude.war...@aiven.io >> > > .invalid>: >> > > > > > >> > > > > > > As I wrote in [1], the ACL evaluation algorithm needs to be >> > > specified >> > > > > > with >> > > > > > > respect to the specificity of the pattern so that we know >> exactly >> > > > which >> > > > > > if >> > > > > > > *-accounts-* takes precedence over nl-accounts-* or visa >> versa. >> > > > > > > >> > > > > > > I think that we should spell out that precedence is evaluated >> as >> > > > follows: >> > > > > > > >> > > > > > > 1. Remove patterns that do not match >> > > > > > > 2. More specific patterns take precedence over less specific >> > > patterns >> > > > > > > 3. for patterns of the same precedence DENY overrides ALLOW >> > > > > > > >> > > > > > > Determining specificity: >> > > > > > > >> > > > > > > Specificity is based on the Levenshtein distance between the >> > > pattern >> > > > and >> > > > > > > the text being evaluated. The lower the distance the more >> specific >> > > > the >> > > > > > > rule. >> > > > > > > Using the topic name: nl-accounts-localtopic we can evaluate >> the >> > > > > > > Levenshtein distance for various patterns. >> > > > > > > nl-accounts-localtopic = 0 >> > > > > > > *-accounts-localtopic = 2 >> > > > > > > nl-accounts-local* = 5 >> > > > > > > *-accounts-local* = 7 >> > > > > > > nl-accounts-* = 10 >> > > > > > > *-accounts-* = 12 >> > > > > > > >> > > > > > > In the special case of matching principles User matches are >> more >> > > > specific >> > > > > > > than Group matches. >> > > > > > > >> > > > > > > I don't know if this should be added to KIP-1042 or presented >> as a >> > > > new >> > > > > > KIP. >> > > > > > > >> > > > > > > Claude >> > > > > > > >> > > > > > > [1] >> > > https://lists.apache.org/thread/0l88tkbxq3ol9rnx0ljnmswj5y6pho1f >> > > > > > > < >> > > > > > > >> > > > > > >> > > > >> > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls >> > > > > > > > >> > > > > > > >> > > > > > > On Fri, May 3, 2024 at 12:18 PM Claude Warren < >> cla...@xenei.com> >> > > > wrote: >> > > > > > > >> > > > > > > > Took me awhile to find it but the link to the KIP is >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > >> > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls >> > > > > > > > >> > > > > > > > On Fri, May 3, 2024 at 10:13 AM Murali Basani < >> > > > murali.bas...@gmail.com >> > > > > > > >> > > > > > > > wrote: >> > > > > > > > >> > > > > > > > > Hello, >> > > > > > > > > >> > > > > > > > > I'd like to propose a suggestion to our resource patterns >> in >> > > > Kafka >> > > > > > > ACLs. >> > > > > > > > > >> > > > > > > > > Currently, when adding new ACLs in Kafka, we have two >> types of >> > > > > > resource >> > > > > > > > > patterns for topics: >> > > > > > > > > >> > > > > > > > > - LITERAL >> > > > > > > > > - PREFIXED >> > > > > > > > > >> > > > > > > > > However, when it comes to listing or removing ACLs, we >> have a >> > > > couple >> > > > > > > more >> > > > > > > > > options: >> > > > > > > > > >> > > > > > > > > - MATCH >> > > > > > > > > - ANY (will match any pattern type) >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > If we can extend creating acls as well with 'MATCH' >> pattern >> > > > type, it >> > > > > > > > would >> > > > > > > > > be very beneficial. Even though this kind of acl should be >> > > > created >> > > > > > with >> > > > > > > > > utmost care, it will help organizations streamline their >> ACL >> > > > > > management >> > > > > > > > > processes. >> > > > > > > > > >> > > > > > > > > Example scenarios : >> > > > > > > > > >> > > > > > > > > Let's say we need to create ACLs for the following six >> topics: >> > > > > > > > > nl-accounts-localtopic, nl-accounts-remotetopic, >> > > > > > > de-accounts-localtopic, >> > > > > > > > > de-accounts-remotetopic, cz-accounts-localtopic, >> > > > > > > cz-accounts-remotetopic >> > > > > > > > > >> > > > > > > > > Currently, we achieve this using existing functionality by >> > > > creating >> > > > > > > three >> > > > > > > > > prefixed ACLs as shown below: >> > > > > > > > > >> > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \ >> > > > > > > > > > --add \ >> > > > > > > > > > --allow-principal >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > >> > > >> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown >> > > > > > > > > > \ >> > > > > > > > > > --producer \ >> > > > > > > > > > --topic nl-accounts- \ >> > > > > > > > > > --resource-pattern-type prefixed >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \ >> > > > > > > > > > --add \ >> > > > > > > > > > --allow-principal >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > >> > > >> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown >> > > > > > > > > > \ >> > > > > > > > > > --producer \ >> > > > > > > > > > --topic de-accounts- \ >> > > > > > > > > > --resource-pattern-type prefixed >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \ >> > > > > > > > > > --add \ >> > > > > > > > > > --allow-principal >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > >> > > >> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown >> > > > > > > > > > \ >> > > > > > > > > > --producer \ >> > > > > > > > > > --topic cz-accounts- \ >> > > > > > > > > > --resource-pattern-type prefixed >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > However, if we had the 'MATCH' pattern type available, we >> could >> > > > > > > > accomplish >> > > > > > > > > this with a single ACL, as illustrated here: >> > > > > > > > > >> > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \ >> > > > > > > > > > --add \ >> > > > > > > > > > --allow-principal >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > >> > > >> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown >> > > > > > > > > > \ >> > > > > > > > > > --producer \ >> > > > > > > > > > --topic *-accounts-* \ >> > > > > > > > > > --resource-pattern-type match >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > This pattern closely resembles PREFIXED but offers broader >> > > > allow/deny >> > > > > > > > > rules. >> > > > > > > > > >> > > > > > > > > Implementing this change could significantly reduce the >> effort >> > > in >> > > > > > > several >> > > > > > > > > acl management processes. >> > > > > > > > > >> > > > > > > > > I welcome your thoughts and any concerns you may have >> regarding >> > > > this >> > > > > > > > > proposal. >> > > > > > > > > >> > > > > > > > > Thanks, >> > > > > > > > > Murali >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > -- >> > > > > > > > LinkedIn: http://www.linkedin.com/in/claudewarren >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > -- >> > > > > > ======================== >> > > > > > Okada Haruki >> > > > > > ocadar...@gmail.com >> > > > > > ======================== >> > > > > > >> > > > > >> > > > > >> > > > > -- >> > > > > LinkedIn: http://www.linkedin.com/in/claudewarren >> > > > >> > > >> > > >> > > -- >> > > LinkedIn: http://www.linkedin.com/in/claudewarren >> > > >> >