I have updated KIP-1042 with a proposal for how to reduce the time spent
looking for matching wildcard patterns.  Experimentally I see a reduction
of 66-75% execution time.

On Mon, May 6, 2024 at 9:03 PM Greg Harris <greg.har...@aiven.io.invalid>
wrote:

> Hi Murali,
>
> Thanks for the KIP!
>
> I think I understand the motivation for this KIP in situations where
> there are a "cross product" of topics for two or more variables X and
> Y, and want to write ACLs for each of the variable axes.
> If you format your topics "X-Y-suffix", it's not easy to write rules
> that apply to all "Y" topics, because you need to enumerate all of the
> "X" values, and the problem persists even if you reorder the topic
> name.
>
> In my recent work on KIP-986 I found it necessary to introduce
> "namespaces" to group topics together, and I was going to replicate
> the ACL system to specify those namespaces. This change to the ACL
> system could increase the expressiveness and complexity of that
> feature, if it is ever implemented.
> One of the primitives I needed when specifying namespaces was the
> ability to tell when two namespaces overlapped (i.e. does there exist
> any topic which is present in both namespaces). This is trivial to do
> with the current PREFIX and LITERAL system, as we can find the
> maximum-length common prefix with just some length comparisons and
> equality checks.
> I considered specifying namespaces via regular expressions, and found
> that it was computationally much more difficult. Computing the
> intersection of two regexes appears to be exponential in the length of
> the regexes, leading me to avoid adding it.
>
> I understand that you're not suggesting full REGEX support, and that
> "namespaces" don't need to support MATCH, but I think MATCH may run
> into related difficulties. Any MATCH can overlap with any other MATCH
> or PREFIX if it includes a sufficient number of wildcards. For
> example:
> MATCH *-accounts-* has overlap with PREFIX nl as they can both match
> "nl-accounts-localtopic", but that isn't sensitive to the contents
> "nl", it is true for any PREFIX.
> MATCH *-accounts-* has overlap with MATCH *localtopic, as they can
> both match "nl-accounts-localtopic", but that isn't actually sensitive
> to the contents "localtopic", it's true for any MATCH which includes a
> wildcard at the beginning.
>
> This has implications for execution complexity: If we can't compute
> whether two patterns overlap, then we need to run both of them on each
> piece of input to test if they both match. Under the current
> LITERAL/PREFIX system, we can optimize execution with a trie, but that
> option wouldn't be available to us with MATCH.
>
> The current system makes users evaluate a trade-off:
> 1. Optimize the number of ACLs by organizing topics according to
> prefixes (for example, "accounts-localtopic-nl" and PREFIX "accounts",
> PREFIX "accounts-localtopic")
> 2. Use less-structured topic names, with a corresponding ACL scheme
> that has more individual rules.
> The system currently informs users of this tradeoff by making them
> write multiple ACLs, and making them think "there has got to be a
> better way!". Perhaps we can find a better way to surface this best
> practice, or better inform users about it.
>
> I understand that there are going to be situations more complex than
> your example, where multiple individual rules will always be necessary
> with only PREFIX evaluation. I think even in those situations, a
> number of efficient-to-evaluate rules is preferable to just one
> expensive-to-evaluate rule.
>
> One alternative that I thought of could be "PARAMETERIZED" ACLs which
> are like PREFIXED, but allow some parameter substitution. For example
> PARAMETERIZED "(nl|de|cz)-accounts-". I'm lifting regex syntax here,
> but this isn't actually a regex, and wouldn't allow arbitrary numbers
> of characters, or the * or + operators.
> In the background it could evaluate exactly like the 3 individual
> PREFIX rules, but be easier to evaluate on the backend, and support
> the intersection query I mentioned earlier. It could also support
> [a-zA-Z] notation in case the parameter values aren't known ahead of
> time, but have a fixed length.
>
> Thanks,
> Greg
>
> On Mon, May 6, 2024 at 11:17 AM Claude Warren <cla...@xenei.com> wrote:
> >
> > I have an idea for how to reduce the time for ACL lookups in general and
> > particularly where wildcards are involved using sequence
> > characterization techniques from bioinformatics.  But I need a set of ACL
> > patterns and associated topics to test with.
> >
> > On Fri, May 3, 2024 at 2:45 PM Haruki Okada <ocadar...@gmail.com> wrote:
> >
> > > Hi, Murali.
> > >
> > > First, could you add the KIP-1042 to the index (
> > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> > > )
> > > as well so that everyone can find it easily?
> > >
> > > I took a look at the KIP, then I have 2 questions:
> > >
> > > 1. Though the new MATCH resource pattern type may reduce the effort of
> > > adding ACLs in some cases, do you have any concrete use case you are in
> > > mind? (When prefixed ACL was introduced in KIP-290, there was a
> use-case
> > > that using it for implementing multi-tenancy)
> > >
> > > 2. As you may know, ACL lookup is in the hot-path which the
> performance is
> > > very important. (
> > >
> > >
> https://github.com/apache/kafka/blob/240243b91d69c2b65b5e456065fdcce90c121b04/core/src/main/scala/kafka/security/authorizer/AclAuthorizer.scala#L539
> > > ).
> > > Do you have ideas how do we update `matchingAcls` to support
> MATCH-type ACL
> > > without introducing performance issue?
> > >
> > >
> > > Thanks,
> > >
> > > 2024年5月3日(金) 19:51 Claude Warren, Jr <claude.war...@aiven.io.invalid>:
> > >
> > > > As I wrote in [1], the ACL evaluation algorithm needs to be specified
> > > with
> > > > respect to the specificity of the pattern so that we know exactly
> which
> > > if
> > > > *-accounts-* takes precedence over nl-accounts-* or visa versa.
> > > >
> > > > I think that we should spell out that precedence is evaluated as
> follows:
> > > >
> > > > 1. Remove patterns that do not match
> > > > 2. More specific patterns take precedence over less specific patterns
> > > > 3. for patterns of the same precedence DENY overrides ALLOW
> > > >
> > > > Determining specificity:
> > > >
> > > > Specificity is based on the Levenshtein distance between the pattern
> and
> > > > the text being evaluated. The lower the distance the more specific
> the
> > > > rule.
> > > > Using the topic name: nl-accounts-localtopic we can evaluate the
> > > > Levenshtein distance for various patterns.
> > > > nl-accounts-localtopic = 0
> > > > *-accounts-localtopic = 2
> > > > nl-accounts-local* = 5
> > > > *-accounts-local* = 7
> > > > nl-accounts-* = 10
> > > > *-accounts-* = 12
> > > >
> > > > In the special case of matching principles User matches are more
> specific
> > > > than Group matches.
> > > >
> > > > I don't know if this should be added to KIP-1042 or presented as a
> new
> > > KIP.
> > > >
> > > > Claude
> > > >
> > > > [1] https://lists.apache.org/thread/0l88tkbxq3ol9rnx0ljnmswj5y6pho1f
> > > > <
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls
> > > > >
> > > >
> > > > On Fri, May 3, 2024 at 12:18 PM Claude Warren <cla...@xenei.com>
> wrote:
> > > >
> > > > > Took me awhile to find it but the link to the KIP is
> > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls
> > > > >
> > > > > On Fri, May 3, 2024 at 10:13 AM Murali Basani <
> murali.bas...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I'd like to propose a suggestion to our resource patterns in
> Kafka
> > > > ACLs.
> > > > > >
> > > > > > Currently, when adding new ACLs in Kafka, we have two types of
> > > resource
> > > > > > patterns for topics:
> > > > > >
> > > > > >    - LITERAL
> > > > > >    - PREFIXED
> > > > > >
> > > > > > However, when it comes to listing or removing ACLs, we have a
> couple
> > > > more
> > > > > > options:
> > > > > >
> > > > > >    - MATCH
> > > > > >    - ANY (will match any pattern type)
> > > > > >
> > > > > >
> > > > > > If we can extend creating acls as well with 'MATCH' pattern
> type, it
> > > > > would
> > > > > > be very beneficial. Even though this kind of acl should be
> created
> > > with
> > > > > > utmost care, it will help organizations streamline their ACL
> > > management
> > > > > > processes.
> > > > > >
> > > > > > Example scenarios :
> > > > > >
> > > > > > Let's say we need to create ACLs for the following six topics:
> > > > > > nl-accounts-localtopic, nl-accounts-remotetopic,
> > > > de-accounts-localtopic,
> > > > > > de-accounts-remotetopic, cz-accounts-localtopic,
> > > > cz-accounts-remotetopic
> > > > > >
> > > > > > Currently, we achieve this using existing functionality by
> creating
> > > > three
> > > > > > prefixed ACLs as shown below:
> > > > > >
> > > > > > kafka-acls --bootstrap-server localhost:9092 \
> > > > > > > --add \
> > > > > > > --allow-principal
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
> > > > > > > \
> > > > > > > --producer \
> > > > > > > --topic nl-accounts- \
> > > > > > > --resource-pattern-type prefixed
> > > > > >
> > > > > >
> > > > > > kafka-acls --bootstrap-server localhost:9092 \
> > > > > > > --add \
> > > > > > > --allow-principal
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
> > > > > > > \
> > > > > > > --producer \
> > > > > > > --topic de-accounts- \
> > > > > > > --resource-pattern-type prefixed
> > > > > >
> > > > > >
> > > > > > kafka-acls --bootstrap-server localhost:9092 \
> > > > > > > --add \
> > > > > > > --allow-principal
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
> > > > > > > \
> > > > > > > --producer \
> > > > > > > --topic cz-accounts- \
> > > > > > > --resource-pattern-type prefixed
> > > > > >
> > > > > >
> > > > > > However, if we had the 'MATCH' pattern type available, we could
> > > > > accomplish
> > > > > > this with a single ACL, as illustrated here:
> > > > > >
> > > > > > kafka-acls --bootstrap-server localhost:9092 \
> > > > > > > --add \
> > > > > > > --allow-principal
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
> > > > > > > \
> > > > > > > --producer \
> > > > > > > --topic *-accounts-* \
> > > > > > > --resource-pattern-type match
> > > > > >
> > > > > >
> > > > > >
> > > > > > This pattern closely resembles PREFIXED but offers broader
> allow/deny
> > > > > > rules.
> > > > > >
> > > > > > Implementing this change could significantly reduce the effort in
> > > > several
> > > > > > acl management processes.
> > > > > >
> > > > > > I welcome your thoughts and any concerns you may have regarding
> this
> > > > > > proposal.
> > > > > >
> > > > > > Thanks,
> > > > > > Murali
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > LinkedIn: http://www.linkedin.com/in/claudewarren
> > > > >
> > > >
> > >
> > >
> > > --
> > > ========================
> > > Okada Haruki
> > > ocadar...@gmail.com
> > > ========================
> > >
> >
> >
> > --
> > LinkedIn: http://www.linkedin.com/in/claudewarren
>

Reply via email to