Updated KIP with the provided suggestions from Greg and Claude.

We shall optimize matching acls with Trie based solutions for PREFIXED and
LITERAL acls. Basically we define trie structures, populate trie and
retrieve acls which seems very efficient.
This way, we aim to reduce the time to authorize every call even after
introducing MATCH.

On Thu, May 16, 2024 at 12:38 PM Claude Warren <cla...@xenei.com> wrote:

> I think that a Trie solution for LITERAL and PREFIX would be faster than
> attempting to use the wildcard search strategies on them.  The reasons
> being
>
>    - that the wildcard search strategy may return some non-matching (false
>    positive) patterns that have to be checked and ignored.
>    - the size of the Bloom filters used for searching may be artificially
>    inflated because of the length of the LITERAL matches (PREFIX matches
> too
>    but LITERAL will probably be longer anyway)
>    - The WILDCARD type will require a Bloom filter and it should be created
>    once and accept the 80-100 byte of overhead rather than the calculation
>    time on every use.
>
>
>
> On Wed, May 15, 2024 at 5:00 PM Murali Basani <murali.bas...@gmail.com>
> wrote:
>
> > Thank you Haruki for the feedback.
> > There are certainly many such use cases where customers ended up creating
> > custom authorizers to handle these MATCH based patterns.
> > And regarding updating method matchingAcls, Claude updated the KIP
> > mentioning an approach, which definitely optimizes the flow, but can be
> > discussed further during the PR implementation I believe.
> >
> > Thank you Greg for the feedback.
> > I understand with trie, existing mechanisms for PREFIX/LITERAL can be
> > optimized. How about even trying this matchingAcls with a new
> > implementation which Claude proposed, may be we can improve the existing
> > flows. Probably we can see those details during PR development.
> >
> > If we don't achieve the expected results as per AuthorizerBenchmark, we
> can
> > drop this kip.
> >
> > And thank you Claude for the suggestion on the new implementation.
> >
> > On Tue, May 7, 2024 at 4:37 PM Claude Warren, Jr
> > <claude.war...@aiven.io.invalid> wrote:
> >
> > > I have updated KIP-1042 with a proposal for how to reduce the time
> spent
> > > looking for matching wildcard patterns.  Experimentally I see a
> reduction
> > > of 66-75% execution time.
> > >
> > > On Mon, May 6, 2024 at 9:03 PM Greg Harris
> <greg.har...@aiven.io.invalid
> > >
> > > wrote:
> > >
> > > > Hi Murali,
> > > >
> > > > Thanks for the KIP!
> > > >
> > > > I think I understand the motivation for this KIP in situations where
> > > > there are a "cross product" of topics for two or more variables X and
> > > > Y, and want to write ACLs for each of the variable axes.
> > > > If you format your topics "X-Y-suffix", it's not easy to write rules
> > > > that apply to all "Y" topics, because you need to enumerate all of
> the
> > > > "X" values, and the problem persists even if you reorder the topic
> > > > name.
> > > >
> > > > In my recent work on KIP-986 I found it necessary to introduce
> > > > "namespaces" to group topics together, and I was going to replicate
> > > > the ACL system to specify those namespaces. This change to the ACL
> > > > system could increase the expressiveness and complexity of that
> > > > feature, if it is ever implemented.
> > > > One of the primitives I needed when specifying namespaces was the
> > > > ability to tell when two namespaces overlapped (i.e. does there exist
> > > > any topic which is present in both namespaces). This is trivial to do
> > > > with the current PREFIX and LITERAL system, as we can find the
> > > > maximum-length common prefix with just some length comparisons and
> > > > equality checks.
> > > > I considered specifying namespaces via regular expressions, and found
> > > > that it was computationally much more difficult. Computing the
> > > > intersection of two regexes appears to be exponential in the length
> of
> > > > the regexes, leading me to avoid adding it.
> > > >
> > > > I understand that you're not suggesting full REGEX support, and that
> > > > "namespaces" don't need to support MATCH, but I think MATCH may run
> > > > into related difficulties. Any MATCH can overlap with any other MATCH
> > > > or PREFIX if it includes a sufficient number of wildcards. For
> > > > example:
> > > > MATCH *-accounts-* has overlap with PREFIX nl as they can both match
> > > > "nl-accounts-localtopic", but that isn't sensitive to the contents
> > > > "nl", it is true for any PREFIX.
> > > > MATCH *-accounts-* has overlap with MATCH *localtopic, as they can
> > > > both match "nl-accounts-localtopic", but that isn't actually
> sensitive
> > > > to the contents "localtopic", it's true for any MATCH which includes
> a
> > > > wildcard at the beginning.
> > > >
> > > > This has implications for execution complexity: If we can't compute
> > > > whether two patterns overlap, then we need to run both of them on
> each
> > > > piece of input to test if they both match. Under the current
> > > > LITERAL/PREFIX system, we can optimize execution with a trie, but
> that
> > > > option wouldn't be available to us with MATCH.
> > > >
> > > > The current system makes users evaluate a trade-off:
> > > > 1. Optimize the number of ACLs by organizing topics according to
> > > > prefixes (for example, "accounts-localtopic-nl" and PREFIX
> "accounts",
> > > > PREFIX "accounts-localtopic")
> > > > 2. Use less-structured topic names, with a corresponding ACL scheme
> > > > that has more individual rules.
> > > > The system currently informs users of this tradeoff by making them
> > > > write multiple ACLs, and making them think "there has got to be a
> > > > better way!". Perhaps we can find a better way to surface this best
> > > > practice, or better inform users about it.
> > > >
> > > > I understand that there are going to be situations more complex than
> > > > your example, where multiple individual rules will always be
> necessary
> > > > with only PREFIX evaluation. I think even in those situations, a
> > > > number of efficient-to-evaluate rules is preferable to just one
> > > > expensive-to-evaluate rule.
> > > >
> > > > One alternative that I thought of could be "PARAMETERIZED" ACLs which
> > > > are like PREFIXED, but allow some parameter substitution. For example
> > > > PARAMETERIZED "(nl|de|cz)-accounts-". I'm lifting regex syntax here,
> > > > but this isn't actually a regex, and wouldn't allow arbitrary numbers
> > > > of characters, or the * or + operators.
> > > > In the background it could evaluate exactly like the 3 individual
> > > > PREFIX rules, but be easier to evaluate on the backend, and support
> > > > the intersection query I mentioned earlier. It could also support
> > > > [a-zA-Z] notation in case the parameter values aren't known ahead of
> > > > time, but have a fixed length.
> > > >
> > > > Thanks,
> > > > Greg
> > > >
> > > > On Mon, May 6, 2024 at 11:17 AM Claude Warren <cla...@xenei.com>
> > wrote:
> > > > >
> > > > > I have an idea for how to reduce the time for ACL lookups in
> general
> > > and
> > > > > particularly where wildcards are involved using sequence
> > > > > characterization techniques from bioinformatics.  But I need a set
> of
> > > ACL
> > > > > patterns and associated topics to test with.
> > > > >
> > > > > On Fri, May 3, 2024 at 2:45 PM Haruki Okada <ocadar...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi, Murali.
> > > > > >
> > > > > > First, could you add the KIP-1042 to the index (
> > > > > >
> > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> > > > > > )
> > > > > > as well so that everyone can find it easily?
> > > > > >
> > > > > > I took a look at the KIP, then I have 2 questions:
> > > > > >
> > > > > > 1. Though the new MATCH resource pattern type may reduce the
> effort
> > > of
> > > > > > adding ACLs in some cases, do you have any concrete use case you
> > are
> > > in
> > > > > > mind? (When prefixed ACL was introduced in KIP-290, there was a
> > > > use-case
> > > > > > that using it for implementing multi-tenancy)
> > > > > >
> > > > > > 2. As you may know, ACL lookup is in the hot-path which the
> > > > performance is
> > > > > > very important. (
> > > > > >
> > > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/240243b91d69c2b65b5e456065fdcce90c121b04/core/src/main/scala/kafka/security/authorizer/AclAuthorizer.scala#L539
> > > > > > ).
> > > > > > Do you have ideas how do we update `matchingAcls` to support
> > > > MATCH-type ACL
> > > > > > without introducing performance issue?
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > 2024年5月3日(金) 19:51 Claude Warren, Jr <claude.war...@aiven.io
> > > .invalid>:
> > > > > >
> > > > > > > As I wrote in [1], the ACL evaluation algorithm needs to be
> > > specified
> > > > > > with
> > > > > > > respect to the specificity of the pattern so that we know
> exactly
> > > > which
> > > > > > if
> > > > > > > *-accounts-* takes precedence over nl-accounts-* or visa versa.
> > > > > > >
> > > > > > > I think that we should spell out that precedence is evaluated
> as
> > > > follows:
> > > > > > >
> > > > > > > 1. Remove patterns that do not match
> > > > > > > 2. More specific patterns take precedence over less specific
> > > patterns
> > > > > > > 3. for patterns of the same precedence DENY overrides ALLOW
> > > > > > >
> > > > > > > Determining specificity:
> > > > > > >
> > > > > > > Specificity is based on the Levenshtein distance between the
> > > pattern
> > > > and
> > > > > > > the text being evaluated. The lower the distance the more
> > specific
> > > > the
> > > > > > > rule.
> > > > > > > Using the topic name: nl-accounts-localtopic we can evaluate
> the
> > > > > > > Levenshtein distance for various patterns.
> > > > > > > nl-accounts-localtopic = 0
> > > > > > > *-accounts-localtopic = 2
> > > > > > > nl-accounts-local* = 5
> > > > > > > *-accounts-local* = 7
> > > > > > > nl-accounts-* = 10
> > > > > > > *-accounts-* = 12
> > > > > > >
> > > > > > > In the special case of matching principles User matches are
> more
> > > > specific
> > > > > > > than Group matches.
> > > > > > >
> > > > > > > I don't know if this should be added to KIP-1042 or presented
> as
> > a
> > > > new
> > > > > > KIP.
> > > > > > >
> > > > > > > Claude
> > > > > > >
> > > > > > > [1]
> > > https://lists.apache.org/thread/0l88tkbxq3ol9rnx0ljnmswj5y6pho1f
> > > > > > > <
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls
> > > > > > > >
> > > > > > >
> > > > > > > On Fri, May 3, 2024 at 12:18 PM Claude Warren <
> cla...@xenei.com>
> > > > wrote:
> > > > > > >
> > > > > > > > Took me awhile to find it but the link to the KIP is
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls
> > > > > > > >
> > > > > > > > On Fri, May 3, 2024 at 10:13 AM Murali Basani <
> > > > murali.bas...@gmail.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hello,
> > > > > > > > >
> > > > > > > > > I'd like to propose a suggestion to our resource patterns
> in
> > > > Kafka
> > > > > > > ACLs.
> > > > > > > > >
> > > > > > > > > Currently, when adding new ACLs in Kafka, we have two types
> > of
> > > > > > resource
> > > > > > > > > patterns for topics:
> > > > > > > > >
> > > > > > > > >    - LITERAL
> > > > > > > > >    - PREFIXED
> > > > > > > > >
> > > > > > > > > However, when it comes to listing or removing ACLs, we
> have a
> > > > couple
> > > > > > > more
> > > > > > > > > options:
> > > > > > > > >
> > > > > > > > >    - MATCH
> > > > > > > > >    - ANY (will match any pattern type)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > If we can extend creating acls as well with 'MATCH' pattern
> > > > type, it
> > > > > > > > would
> > > > > > > > > be very beneficial. Even though this kind of acl should be
> > > > created
> > > > > > with
> > > > > > > > > utmost care, it will help organizations streamline their
> ACL
> > > > > > management
> > > > > > > > > processes.
> > > > > > > > >
> > > > > > > > > Example scenarios :
> > > > > > > > >
> > > > > > > > > Let's say we need to create ACLs for the following six
> > topics:
> > > > > > > > > nl-accounts-localtopic, nl-accounts-remotetopic,
> > > > > > > de-accounts-localtopic,
> > > > > > > > > de-accounts-remotetopic, cz-accounts-localtopic,
> > > > > > > cz-accounts-remotetopic
> > > > > > > > >
> > > > > > > > > Currently, we achieve this using existing functionality by
> > > > creating
> > > > > > > three
> > > > > > > > > prefixed ACLs as shown below:
> > > > > > > > >
> > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \
> > > > > > > > > > --add \
> > > > > > > > > > --allow-principal
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
> > > > > > > > > > \
> > > > > > > > > > --producer \
> > > > > > > > > > --topic nl-accounts- \
> > > > > > > > > > --resource-pattern-type prefixed
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \
> > > > > > > > > > --add \
> > > > > > > > > > --allow-principal
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
> > > > > > > > > > \
> > > > > > > > > > --producer \
> > > > > > > > > > --topic de-accounts- \
> > > > > > > > > > --resource-pattern-type prefixed
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \
> > > > > > > > > > --add \
> > > > > > > > > > --allow-principal
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
> > > > > > > > > > \
> > > > > > > > > > --producer \
> > > > > > > > > > --topic cz-accounts- \
> > > > > > > > > > --resource-pattern-type prefixed
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > However, if we had the 'MATCH' pattern type available, we
> > could
> > > > > > > > accomplish
> > > > > > > > > this with a single ACL, as illustrated here:
> > > > > > > > >
> > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \
> > > > > > > > > > --add \
> > > > > > > > > > --allow-principal
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
> > > > > > > > > > \
> > > > > > > > > > --producer \
> > > > > > > > > > --topic *-accounts-* \
> > > > > > > > > > --resource-pattern-type match
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > This pattern closely resembles PREFIXED but offers broader
> > > > allow/deny
> > > > > > > > > rules.
> > > > > > > > >
> > > > > > > > > Implementing this change could significantly reduce the
> > effort
> > > in
> > > > > > > several
> > > > > > > > > acl management processes.
> > > > > > > > >
> > > > > > > > > I welcome your thoughts and any concerns you may have
> > regarding
> > > > this
> > > > > > > > > proposal.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Murali
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > LinkedIn: http://www.linkedin.com/in/claudewarren
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > ========================
> > > > > > Okada Haruki
> > > > > > ocadar...@gmail.com
> > > > > > ========================
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > LinkedIn: http://www.linkedin.com/in/claudewarren
> > > >
> > >
> >
>
>
> --
> LinkedIn: http://www.linkedin.com/in/claudewarren
>

Reply via email to