Hi, Just bumping this thread again. It seems no concerns have been raised so far.
I'll request votes in 2 weeks. Thanks, Murali On Tue, May 28, 2024 at 1:24 PM Muralidhar Basani < muralidhar.bas...@aiven.io> wrote: > Hi all, > > Any other suggestions or objections to the proposal? > > Thanks, > Murali > > On Thu, May 23, 2024 at 10:18 AM Muralidhar Basani < > muralidhar.bas...@aiven.io> wrote: > >> Thanks Greg. >> I have updated KIP with details on optimisation of LITERAL too. >> >> Regarding limitations in performance by introducing MATCH is definitely a >> question. >> - By optimizing LITERAL and PREFIX we are gaining a few nano secs I >> think. (described in kip) >> - MATCH can be introduced only with a configuration. So by default, we >> can disable the MATCH check (ex : acls.pattern.match.enable=false), and if >> customer enables the config, only then add the lookup in the matchingAcls() >> - With the proposal already described in KIP for MATCH, we may not have >> any limitations., rather will be efficient. >> >> Maybe we are in good shape with these propositions >> >> Thanks, >> Murali >> >> >> >> On Tue, May 21, 2024 at 6:15 PM Greg Harris <greg.har...@aiven.io.invalid> >> wrote: >> >>> Hi Murali, >>> >>> I don't have a trie library in mind. I looked at the current >>> implementation of the StandardAuthorizer and found that we are already >>> benefiting from the prefix structure in the implementation [1]. The >>> current implementation appears to be a TreePSet [2]. >>> >>> Now, we've already made this tradeoff once with PREFIX: Prefixes are >>> less structured than literals, because with literals you can use a >>> hashing algorithm to jump directly to your relevant ACLs in O(1), but >>> with a prefix you either need to do multiple lookups, or some sort of >>> O(log(n)) lookup. And we determined that the ultimate limitation in >>> performance was worth it for the expressiveness. >>> We're making this tradeoff again with MATCH acls: wildcards are less >>> structured than prefixes or literals, because of the reasons I >>> mentioned earlier. We need to judge now if the ultimate limitation in >>> performance is worth it. >>> >>> I think your strategy for using the optimized layout for prefix and >>> literal matches is smart, because there does seem to be a gap in >>> performance possible. It makes me wonder why the optimized layout for >>> literals was not used when prefixes were added. Literal lookups still >>> go through the tree lookup, when they could be moved to a hash-lookup >>> instead. >>> That would allow users to "choose" for themselves on a convenience vs >>> performance scale: Smaller use-cases can add a single convenient >>> MATCH, and larger use-cases can add the multiple optimized PREFIXes. >>> >>> [1] >>> https://github.com/apache/kafka/blob/9fe3932e5c110443f7fa545fcf0b8f78574f2f73/metadata/src/main/java/org/apache/kafka/metadata/authorizer/StandardAuthorizerData.java#L319-L339 >>> [2] >>> https://github.com/apache/kafka/blob/9fe3932e5c110443f7fa545fcf0b8f78574f2f73/server-common/src/main/java/org/apache/kafka/server/immutable/pcollections/PCollectionsImmutableNavigableSet.java#L34 >>> >>> Thanks, >>> Greg >>> >>> On Tue, May 21, 2024 at 8:23 AM Muralidhar Basani >>> <muralidhar.bas...@aiven.io.invalid> wrote: >>> > >>> > @greg which library of trie you were thinking of ? There is one in the >>> > commons-collection I see. >>> > >>> > On Fri, May 17, 2024 at 3:55 PM Claude Warren <cla...@xenei.com> >>> wrote: >>> > >>> > > > >>> > > > This has implications for execution complexity: If we can't compute >>> > > > whether two patterns overlap, then we need to run both of them on >>> each >>> > > > piece of input to test if they both match. Under the current >>> > > > LITERAL/PREFIX system, we can optimize execution with a trie, but >>> that >>> > > > option wouldn't be available to us with MATCH. >>> > > > >>> > > >>> > > If we consider the case of an asterisk representing 1 or more >>> characters >>> > > then determining if 2 patterns overlap can be fairly simple and very >>> fast. >>> > > Let's call the text from the ACL the target, and the text from the >>> wildcard >>> > > matcher the candidate. >>> > > >>> > > When a wildcard pattern, excluding '*', is created: >>> > > >>> > > - the candidate text is broken into fragments separated by >>> characters >>> > > that are not Character.isLetterOrDigit() (See >>> > > >>> https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html). >>> > > - fragments that contain 1 character are ignored. >>> > > - fragments that contains 2 or more characters are scanned and >>> every >>> > > every pair of characters used to create a Hasher (See >>> > > >>> > > >>> https://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/bloomfilter/Hasher.html >>> > > ) >>> > > that hasher is added to a Bloom filter associated with the >>> wildcard >>> > > pattern. >>> > > >>> > > When a target is being evaluated and matching wildcard entries are >>> to be >>> > > located. Split and create a bloom filter using entry and the same >>> strategy >>> > > as for the wildcard patterns above. These bloom filters will have >>> had more >>> > > pairs of characters added than for a matching wildcard pattern. >>> > > >>> > > Each filter contains the pattern for the unique pairs in the >>> fragments of >>> > > the original text: >>> > > >>> > > Now we can check the Bloom filters. >>> > > >>> > > To find potential matching patterns we can check to see if the Bloom >>> filter >>> > > for the pattern is contained within the bloom filter for the entry >>> text. >>> > > If so then we know that it is likely that the pairs of characters >>> specified >>> > > in the wild card (non-wildcard text) appear in the entry text. >>> > > >>> > > At this point we can evaluate the reduced number of patterns to see >>> which >>> > > ones actually match. >>> > > >>> > > Having reduced the candidates to the matching patterns we can then >>> use a >>> > > standard measure of similarity like the Levenshtein distance to >>> determine >>> > > which candidates require the fewest edits to evolve into the >>> target. The >>> > > one with the fewest edits is the most specific and should be applied. >>> > > >>> > > Now if you want to know which patterns overlap you have a similar >>> process. >>> > > >>> > > Each Bloom filter can calculate the estimated number of unique >>> fragments >>> > > that were added to it. If the filters are sorted by this >>> estimation, the >>> > > ones with the highest counts may contain the ones with the lowest >>> counts, >>> > > but a lower count can never contain a higher count. The calculation >>> to >>> > > perform the estimation can be skipped and the cardinality value of >>> each >>> > > Bloom filter can be used instead. You can then check the smaller >>> filters >>> > > against the larger ones and find where one candidate is contained >>> within >>> > > another. >>> > > >>> > > If you want to know if they intersect the BloomFilter from >>> > > commons-collections has an estimateIntersection as well as an >>> estimateUnion >>> > > method. Quick tests can be made between filters to see if there is >>> any >>> > > overlap before more complex analysis is performed. >>> > > >>> > > The solution here does not make the final comparisons easier, it >>> simply >>> > > reduces the search space to find the items that need to be compared. >>> > > >>> > > >>> > > On Mon, May 6, 2024 at 9:03 PM Greg Harris >>> <greg.har...@aiven.io.invalid> >>> > > wrote: >>> > > >>> > > > Hi Murali, >>> > > > >>> > > > Thanks for the KIP! >>> > > > >>> > > > I think I understand the motivation for this KIP in situations >>> where >>> > > > there are a "cross product" of topics for two or more variables X >>> and >>> > > > Y, and want to write ACLs for each of the variable axes. >>> > > > If you format your topics "X-Y-suffix", it's not easy to write >>> rules >>> > > > that apply to all "Y" topics, because you need to enumerate all of >>> the >>> > > > "X" values, and the problem persists even if you reorder the topic >>> > > > name. >>> > > > >>> > > > In my recent work on KIP-986 I found it necessary to introduce >>> > > > "namespaces" to group topics together, and I was going to replicate >>> > > > the ACL system to specify those namespaces. This change to the ACL >>> > > > system could increase the expressiveness and complexity of that >>> > > > feature, if it is ever implemented. >>> > > > One of the primitives I needed when specifying namespaces was the >>> > > > ability to tell when two namespaces overlapped (i.e. does there >>> exist >>> > > > any topic which is present in both namespaces). This is trivial to >>> do >>> > > > with the current PREFIX and LITERAL system, as we can find the >>> > > > maximum-length common prefix with just some length comparisons and >>> > > > equality checks. >>> > > > I considered specifying namespaces via regular expressions, and >>> found >>> > > > that it was computationally much more difficult. Computing the >>> > > > intersection of two regexes appears to be exponential in the >>> length of >>> > > > the regexes, leading me to avoid adding it. >>> > > > >>> > > > I understand that you're not suggesting full REGEX support, and >>> that >>> > > > "namespaces" don't need to support MATCH, but I think MATCH may run >>> > > > into related difficulties. Any MATCH can overlap with any other >>> MATCH >>> > > > or PREFIX if it includes a sufficient number of wildcards. For >>> > > > example: >>> > > > MATCH *-accounts-* has overlap with PREFIX nl as they can both >>> match >>> > > > "nl-accounts-localtopic", but that isn't sensitive to the contents >>> > > > "nl", it is true for any PREFIX. >>> > > > MATCH *-accounts-* has overlap with MATCH *localtopic, as they can >>> > > > both match "nl-accounts-localtopic", but that isn't actually >>> sensitive >>> > > > to the contents "localtopic", it's true for any MATCH which >>> includes a >>> > > > wildcard at the beginning. >>> > > > >>> > > > This has implications for execution complexity: If we can't compute >>> > > > whether two patterns overlap, then we need to run both of them on >>> each >>> > > > piece of input to test if they both match. Under the current >>> > > > LITERAL/PREFIX system, we can optimize execution with a trie, but >>> that >>> > > > option wouldn't be available to us with MATCH. >>> > > > >>> > > > The current system makes users evaluate a trade-off: >>> > > > 1. Optimize the number of ACLs by organizing topics according to >>> > > > prefixes (for example, "accounts-localtopic-nl" and PREFIX >>> "accounts", >>> > > > PREFIX "accounts-localtopic") >>> > > > 2. Use less-structured topic names, with a corresponding ACL scheme >>> > > > that has more individual rules. >>> > > > The system currently informs users of this tradeoff by making them >>> > > > write multiple ACLs, and making them think "there has got to be a >>> > > > better way!". Perhaps we can find a better way to surface this best >>> > > > practice, or better inform users about it. >>> > > > >>> > > > I understand that there are going to be situations more complex >>> than >>> > > > your example, where multiple individual rules will always be >>> necessary >>> > > > with only PREFIX evaluation. I think even in those situations, a >>> > > > number of efficient-to-evaluate rules is preferable to just one >>> > > > expensive-to-evaluate rule. >>> > > > >>> > > > One alternative that I thought of could be "PARAMETERIZED" ACLs >>> which >>> > > > are like PREFIXED, but allow some parameter substitution. For >>> example >>> > > > PARAMETERIZED "(nl|de|cz)-accounts-". I'm lifting regex syntax >>> here, >>> > > > but this isn't actually a regex, and wouldn't allow arbitrary >>> numbers >>> > > > of characters, or the * or + operators. >>> > > > In the background it could evaluate exactly like the 3 individual >>> > > > PREFIX rules, but be easier to evaluate on the backend, and support >>> > > > the intersection query I mentioned earlier. It could also support >>> > > > [a-zA-Z] notation in case the parameter values aren't known ahead >>> of >>> > > > time, but have a fixed length. >>> > > > >>> > > > Thanks, >>> > > > Greg >>> > > > >>> > > > On Mon, May 6, 2024 at 11:17 AM Claude Warren <cla...@xenei.com> >>> wrote: >>> > > > > >>> > > > > I have an idea for how to reduce the time for ACL lookups in >>> general >>> > > and >>> > > > > particularly where wildcards are involved using sequence >>> > > > > characterization techniques from bioinformatics. But I need a >>> set of >>> > > ACL >>> > > > > patterns and associated topics to test with. >>> > > > > >>> > > > > On Fri, May 3, 2024 at 2:45 PM Haruki Okada <ocadar...@gmail.com >>> > >>> > > wrote: >>> > > > > >>> > > > > > Hi, Murali. >>> > > > > > >>> > > > > > First, could you add the KIP-1042 to the index ( >>> > > > > > >>> > > > > > >>> > > > >>> > > >>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals >>> > > > > > ) >>> > > > > > as well so that everyone can find it easily? >>> > > > > > >>> > > > > > I took a look at the KIP, then I have 2 questions: >>> > > > > > >>> > > > > > 1. Though the new MATCH resource pattern type may reduce the >>> effort >>> > > of >>> > > > > > adding ACLs in some cases, do you have any concrete use case >>> you are >>> > > in >>> > > > > > mind? (When prefixed ACL was introduced in KIP-290, there was a >>> > > > use-case >>> > > > > > that using it for implementing multi-tenancy) >>> > > > > > >>> > > > > > 2. As you may know, ACL lookup is in the hot-path which the >>> > > > performance is >>> > > > > > very important. ( >>> > > > > > >>> > > > > > >>> > > > >>> > > >>> https://github.com/apache/kafka/blob/240243b91d69c2b65b5e456065fdcce90c121b04/core/src/main/scala/kafka/security/authorizer/AclAuthorizer.scala#L539 >>> > > > > > ). >>> > > > > > Do you have ideas how do we update `matchingAcls` to support >>> > > > MATCH-type ACL >>> > > > > > without introducing performance issue? >>> > > > > > >>> > > > > > >>> > > > > > Thanks, >>> > > > > > >>> > > > > > 2024年5月3日(金) 19:51 Claude Warren, Jr <claude.war...@aiven.io >>> > > .invalid>: >>> > > > > > >>> > > > > > > As I wrote in [1], the ACL evaluation algorithm needs to be >>> > > specified >>> > > > > > with >>> > > > > > > respect to the specificity of the pattern so that we know >>> exactly >>> > > > which >>> > > > > > if >>> > > > > > > *-accounts-* takes precedence over nl-accounts-* or visa >>> versa. >>> > > > > > > >>> > > > > > > I think that we should spell out that precedence is >>> evaluated as >>> > > > follows: >>> > > > > > > >>> > > > > > > 1. Remove patterns that do not match >>> > > > > > > 2. More specific patterns take precedence over less specific >>> > > patterns >>> > > > > > > 3. for patterns of the same precedence DENY overrides ALLOW >>> > > > > > > >>> > > > > > > Determining specificity: >>> > > > > > > >>> > > > > > > Specificity is based on the Levenshtein distance between the >>> > > pattern >>> > > > and >>> > > > > > > the text being evaluated. The lower the distance the more >>> specific >>> > > > the >>> > > > > > > rule. >>> > > > > > > Using the topic name: nl-accounts-localtopic we can evaluate >>> the >>> > > > > > > Levenshtein distance for various patterns. >>> > > > > > > nl-accounts-localtopic = 0 >>> > > > > > > *-accounts-localtopic = 2 >>> > > > > > > nl-accounts-local* = 5 >>> > > > > > > *-accounts-local* = 7 >>> > > > > > > nl-accounts-* = 10 >>> > > > > > > *-accounts-* = 12 >>> > > > > > > >>> > > > > > > In the special case of matching principles User matches are >>> more >>> > > > specific >>> > > > > > > than Group matches. >>> > > > > > > >>> > > > > > > I don't know if this should be added to KIP-1042 or >>> presented as a >>> > > > new >>> > > > > > KIP. >>> > > > > > > >>> > > > > > > Claude >>> > > > > > > >>> > > > > > > [1] >>> > > https://lists.apache.org/thread/0l88tkbxq3ol9rnx0ljnmswj5y6pho1f >>> > > > > > > < >>> > > > > > > >>> > > > > > >>> > > > >>> > > >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls >>> > > > > > > > >>> > > > > > > >>> > > > > > > On Fri, May 3, 2024 at 12:18 PM Claude Warren < >>> cla...@xenei.com> >>> > > > wrote: >>> > > > > > > >>> > > > > > > > Took me awhile to find it but the link to the KIP is >>> > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > >>> > > >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls >>> > > > > > > > >>> > > > > > > > On Fri, May 3, 2024 at 10:13 AM Murali Basani < >>> > > > murali.bas...@gmail.com >>> > > > > > > >>> > > > > > > > wrote: >>> > > > > > > > >>> > > > > > > > > Hello, >>> > > > > > > > > >>> > > > > > > > > I'd like to propose a suggestion to our resource >>> patterns in >>> > > > Kafka >>> > > > > > > ACLs. >>> > > > > > > > > >>> > > > > > > > > Currently, when adding new ACLs in Kafka, we have two >>> types of >>> > > > > > resource >>> > > > > > > > > patterns for topics: >>> > > > > > > > > >>> > > > > > > > > - LITERAL >>> > > > > > > > > - PREFIXED >>> > > > > > > > > >>> > > > > > > > > However, when it comes to listing or removing ACLs, we >>> have a >>> > > > couple >>> > > > > > > more >>> > > > > > > > > options: >>> > > > > > > > > >>> > > > > > > > > - MATCH >>> > > > > > > > > - ANY (will match any pattern type) >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > If we can extend creating acls as well with 'MATCH' >>> pattern >>> > > > type, it >>> > > > > > > > would >>> > > > > > > > > be very beneficial. Even though this kind of acl should >>> be >>> > > > created >>> > > > > > with >>> > > > > > > > > utmost care, it will help organizations streamline their >>> ACL >>> > > > > > management >>> > > > > > > > > processes. >>> > > > > > > > > >>> > > > > > > > > Example scenarios : >>> > > > > > > > > >>> > > > > > > > > Let's say we need to create ACLs for the following six >>> topics: >>> > > > > > > > > nl-accounts-localtopic, nl-accounts-remotetopic, >>> > > > > > > de-accounts-localtopic, >>> > > > > > > > > de-accounts-remotetopic, cz-accounts-localtopic, >>> > > > > > > cz-accounts-remotetopic >>> > > > > > > > > >>> > > > > > > > > Currently, we achieve this using existing functionality >>> by >>> > > > creating >>> > > > > > > three >>> > > > > > > > > prefixed ACLs as shown below: >>> > > > > > > > > >>> > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \ >>> > > > > > > > > > --add \ >>> > > > > > > > > > --allow-principal >>> > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > >>> > > >>> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown >>> > > > > > > > > > \ >>> > > > > > > > > > --producer \ >>> > > > > > > > > > --topic nl-accounts- \ >>> > > > > > > > > > --resource-pattern-type prefixed >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \ >>> > > > > > > > > > --add \ >>> > > > > > > > > > --allow-principal >>> > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > >>> > > >>> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown >>> > > > > > > > > > \ >>> > > > > > > > > > --producer \ >>> > > > > > > > > > --topic de-accounts- \ >>> > > > > > > > > > --resource-pattern-type prefixed >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \ >>> > > > > > > > > > --add \ >>> > > > > > > > > > --allow-principal >>> > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > >>> > > >>> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown >>> > > > > > > > > > \ >>> > > > > > > > > > --producer \ >>> > > > > > > > > > --topic cz-accounts- \ >>> > > > > > > > > > --resource-pattern-type prefixed >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > However, if we had the 'MATCH' pattern type available, >>> we could >>> > > > > > > > accomplish >>> > > > > > > > > this with a single ACL, as illustrated here: >>> > > > > > > > > >>> > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \ >>> > > > > > > > > > --add \ >>> > > > > > > > > > --allow-principal >>> > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > >>> > > >>> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown >>> > > > > > > > > > \ >>> > > > > > > > > > --producer \ >>> > > > > > > > > > --topic *-accounts-* \ >>> > > > > > > > > > --resource-pattern-type match >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > This pattern closely resembles PREFIXED but offers >>> broader >>> > > > allow/deny >>> > > > > > > > > rules. >>> > > > > > > > > >>> > > > > > > > > Implementing this change could significantly reduce the >>> effort >>> > > in >>> > > > > > > several >>> > > > > > > > > acl management processes. >>> > > > > > > > > >>> > > > > > > > > I welcome your thoughts and any concerns you may have >>> regarding >>> > > > this >>> > > > > > > > > proposal. >>> > > > > > > > > >>> > > > > > > > > Thanks, >>> > > > > > > > > Murali >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > > > > > > -- >>> > > > > > > > LinkedIn: http://www.linkedin.com/in/claudewarren >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > > >>> > > > > > -- >>> > > > > > ======================== >>> > > > > > Okada Haruki >>> > > > > > ocadar...@gmail.com >>> > > > > > ======================== >>> > > > > > >>> > > > > >>> > > > > >>> > > > > -- >>> > > > > LinkedIn: http://www.linkedin.com/in/claudewarren >>> > > > >>> > > >>> > > >>> > > -- >>> > > LinkedIn: http://www.linkedin.com/in/claudewarren >>> > > >>> >>