Re: [DISCUSS] KIP-1042 support for wildcard when creating new acls

Muralidhar Basani Thu, 23 May 2024 01:20:06 -0700

Thanks Greg.
I have updated KIP with details on optimisation of LITERAL too.


Regarding limitations in performance by introducing MATCH is definitely a
question.
- By optimizing LITERAL and PREFIX we are gaining a few nano secs I think.
(described in kip)
- MATCH can be introduced only with a configuration. So by default, we can
disable the MATCH check (ex : acls.pattern.match.enable=false), and if
customer enables the config, only then add the lookup in the matchingAcls()
- With the proposal already described in KIP for MATCH, we may not have any
limitations., rather will be efficient.

Maybe we are in good shape with these propositions

Thanks,
Murali



On Tue, May 21, 2024 at 6:15 PM Greg Harris <[email protected]>
wrote:

> Hi Murali,
>
> I don't have a trie library in mind. I looked at the current
> implementation of the StandardAuthorizer and found that we are already
> benefiting from the prefix structure in the implementation [1]. The
> current implementation appears to be a TreePSet [2].
>
> Now, we've already made this tradeoff once with PREFIX: Prefixes are
> less structured than literals, because with literals you can use a
> hashing algorithm to jump directly to your relevant ACLs in O(1), but
> with a prefix you either need to do multiple lookups, or some sort of
> O(log(n)) lookup. And we determined that the ultimate limitation in
> performance was worth it for the expressiveness.
> We're making this tradeoff again with MATCH acls: wildcards are less
> structured than prefixes or literals, because of the reasons I
> mentioned earlier. We need to judge now if the ultimate limitation in
> performance is worth it.
>
> I think your strategy for using the optimized layout for prefix and
> literal matches is smart, because there does seem to be a gap in
> performance possible. It makes me wonder why the optimized layout for
> literals was not used when prefixes were added. Literal lookups still
> go through the tree lookup, when they could be moved to a hash-lookup
> instead.
> That would allow users to "choose" for themselves on a convenience vs
> performance scale: Smaller use-cases can add a single convenient
> MATCH, and larger use-cases can add the multiple optimized PREFIXes.
>
> [1]
> https://github.com/apache/kafka/blob/9fe3932e5c110443f7fa545fcf0b8f78574f2f73/metadata/src/main/java/org/apache/kafka/metadata/authorizer/StandardAuthorizerData.java#L319-L339
> [2]
> https://github.com/apache/kafka/blob/9fe3932e5c110443f7fa545fcf0b8f78574f2f73/server-common/src/main/java/org/apache/kafka/server/immutable/pcollections/PCollectionsImmutableNavigableSet.java#L34
>
> Thanks,
> Greg
>
> On Tue, May 21, 2024 at 8:23 AM Muralidhar Basani
> <[email protected]> wrote:
> >
> > @greg which library of trie you were thinking of ? There is one in the
> > commons-collection I see.
> >
> > On Fri, May 17, 2024 at 3:55 PM Claude Warren <[email protected]> wrote:
> >
> > > >
> > > > This has implications for execution complexity: If we can't compute
> > > > whether two patterns overlap, then we need to run both of them on
> each
> > > > piece of input to test if they both match. Under the current
> > > > LITERAL/PREFIX system, we can optimize execution with a trie, but
> that
> > > > option wouldn't be available to us with MATCH.
> > > >
> > >
> > > If we consider the case of an asterisk representing 1 or more
> characters
> > > then determining if 2 patterns overlap can be fairly simple and very
> fast.
> > > Let's call the text from the ACL the target, and the text from the
> wildcard
> > > matcher the candidate.
> > >
> > > When a wildcard pattern, excluding '*',  is created:
> > >
> > >    - the candidate text is broken into fragments separated by
> characters
> > >    that are not Character.isLetterOrDigit() (See
> > >    https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html
> ).
> > >    - fragments that contain 1 character are ignored.
> > >    - fragments that contains 2 or more characters are scanned and every
> > >    every pair of characters used to create a Hasher (See
> > >
> > >
> https://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/bloomfilter/Hasher.html
> > > )
> > >    that hasher is added to a Bloom filter associated with the wildcard
> > > pattern.
> > >
> > > When a target is being evaluated and matching wildcard entries are to
> be
> > > located. Split and create a bloom filter using entry and the same
> strategy
> > > as for the wildcard patterns above.  These bloom filters will have had
> more
> > > pairs of characters added than for a matching wildcard pattern.
> > >
> > > Each filter contains the pattern for the unique pairs in the fragments
> of
> > > the original text:
> > >
> > > Now we can check the Bloom filters.
> > >
> > > To find potential matching patterns we can check to see if the Bloom
> filter
> > > for the pattern is contained within the bloom filter for the entry
> text.
> > > If so then we know that it is likely that the pairs of characters
> specified
> > > in the wild card (non-wildcard text) appear in the entry text.
> > >
> > > At this point we can evaluate the reduced number of patterns to see
> which
> > > ones actually match.
> > >
> > > Having reduced the candidates to the matching patterns we can then use
> a
> > > standard measure of similarity like the Levenshtein distance to
> determine
> > > which candidates require the fewest edits to evolve into the target.
> The
> > > one with the fewest edits is the most specific and should be applied.
> > >
> > > Now if you want to know which patterns overlap you have a similar
> process.
> > >
> > > Each Bloom filter can calculate the estimated number of unique
> fragments
> > > that were added to it.  If the filters are sorted by this estimation,
> the
> > > ones with the highest counts may contain the ones with the lowest
> counts,
> > > but a lower count can never contain a higher count.  The calculation to
> > > perform the estimation can be skipped and the cardinality value of each
> > > Bloom filter can be used instead.  You can then check the smaller
> filters
> > > against the larger ones and find where one candidate is contained
> within
> > > another.
> > >
> > > If you want to know if they intersect the BloomFilter from
> > > commons-collections has an estimateIntersection as well as an
> estimateUnion
> > > method.  Quick tests can be made between filters to see if there is any
> > > overlap before more complex analysis is performed.
> > >
> > > The solution here does not make the final comparisons easier, it simply
> > > reduces the search space to find the items that need to be compared.
> > >
> > >
> > > On Mon, May 6, 2024 at 9:03 PM Greg Harris
> <[email protected]>
> > > wrote:
> > >
> > > > Hi Murali,
> > > >
> > > > Thanks for the KIP!
> > > >
> > > > I think I understand the motivation for this KIP in situations where
> > > > there are a "cross product" of topics for two or more variables X and
> > > > Y, and want to write ACLs for each of the variable axes.
> > > > If you format your topics "X-Y-suffix", it's not easy to write rules
> > > > that apply to all "Y" topics, because you need to enumerate all of
> the
> > > > "X" values, and the problem persists even if you reorder the topic
> > > > name.
> > > >
> > > > In my recent work on KIP-986 I found it necessary to introduce
> > > > "namespaces" to group topics together, and I was going to replicate
> > > > the ACL system to specify those namespaces. This change to the ACL
> > > > system could increase the expressiveness and complexity of that
> > > > feature, if it is ever implemented.
> > > > One of the primitives I needed when specifying namespaces was the
> > > > ability to tell when two namespaces overlapped (i.e. does there exist
> > > > any topic which is present in both namespaces). This is trivial to do
> > > > with the current PREFIX and LITERAL system, as we can find the
> > > > maximum-length common prefix with just some length comparisons and
> > > > equality checks.
> > > > I considered specifying namespaces via regular expressions, and found
> > > > that it was computationally much more difficult. Computing the
> > > > intersection of two regexes appears to be exponential in the length
> of
> > > > the regexes, leading me to avoid adding it.
> > > >
> > > > I understand that you're not suggesting full REGEX support, and that
> > > > "namespaces" don't need to support MATCH, but I think MATCH may run
> > > > into related difficulties. Any MATCH can overlap with any other MATCH
> > > > or PREFIX if it includes a sufficient number of wildcards. For
> > > > example:
> > > > MATCH *-accounts-* has overlap with PREFIX nl as they can both match
> > > > "nl-accounts-localtopic", but that isn't sensitive to the contents
> > > > "nl", it is true for any PREFIX.
> > > > MATCH *-accounts-* has overlap with MATCH *localtopic, as they can
> > > > both match "nl-accounts-localtopic", but that isn't actually
> sensitive
> > > > to the contents "localtopic", it's true for any MATCH which includes
> a
> > > > wildcard at the beginning.
> > > >
> > > > This has implications for execution complexity: If we can't compute
> > > > whether two patterns overlap, then we need to run both of them on
> each
> > > > piece of input to test if they both match. Under the current
> > > > LITERAL/PREFIX system, we can optimize execution with a trie, but
> that
> > > > option wouldn't be available to us with MATCH.
> > > >
> > > > The current system makes users evaluate a trade-off:
> > > > 1. Optimize the number of ACLs by organizing topics according to
> > > > prefixes (for example, "accounts-localtopic-nl" and PREFIX
> "accounts",
> > > > PREFIX "accounts-localtopic")
> > > > 2. Use less-structured topic names, with a corresponding ACL scheme
> > > > that has more individual rules.
> > > > The system currently informs users of this tradeoff by making them
> > > > write multiple ACLs, and making them think "there has got to be a
> > > > better way!". Perhaps we can find a better way to surface this best
> > > > practice, or better inform users about it.
> > > >
> > > > I understand that there are going to be situations more complex than
> > > > your example, where multiple individual rules will always be
> necessary
> > > > with only PREFIX evaluation. I think even in those situations, a
> > > > number of efficient-to-evaluate rules is preferable to just one
> > > > expensive-to-evaluate rule.
> > > >
> > > > One alternative that I thought of could be "PARAMETERIZED" ACLs which
> > > > are like PREFIXED, but allow some parameter substitution. For example
> > > > PARAMETERIZED "(nl|de|cz)-accounts-". I'm lifting regex syntax here,
> > > > but this isn't actually a regex, and wouldn't allow arbitrary numbers
> > > > of characters, or the * or + operators.
> > > > In the background it could evaluate exactly like the 3 individual
> > > > PREFIX rules, but be easier to evaluate on the backend, and support
> > > > the intersection query I mentioned earlier. It could also support
> > > > [a-zA-Z] notation in case the parameter values aren't known ahead of
> > > > time, but have a fixed length.
> > > >
> > > > Thanks,
> > > > Greg
> > > >
> > > > On Mon, May 6, 2024 at 11:17 AM Claude Warren <[email protected]>
> wrote:
> > > > >
> > > > > I have an idea for how to reduce the time for ACL lookups in
> general
> > > and
> > > > > particularly where wildcards are involved using sequence
> > > > > characterization techniques from bioinformatics.  But I need a set
> of
> > > ACL
> > > > > patterns and associated topics to test with.
> > > > >
> > > > > On Fri, May 3, 2024 at 2:45 PM Haruki Okada <[email protected]>
> > > wrote:
> > > > >
> > > > > > Hi, Murali.
> > > > > >
> > > > > > First, could you add the KIP-1042 to the index (
> > > > > >
> > > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> > > > > > )
> > > > > > as well so that everyone can find it easily?
> > > > > >
> > > > > > I took a look at the KIP, then I have 2 questions:
> > > > > >
> > > > > > 1. Though the new MATCH resource pattern type may reduce the
> effort
> > > of
> > > > > > adding ACLs in some cases, do you have any concrete use case you
> are
> > > in
> > > > > > mind? (When prefixed ACL was introduced in KIP-290, there was a
> > > > use-case
> > > > > > that using it for implementing multi-tenancy)
> > > > > >
> > > > > > 2. As you may know, ACL lookup is in the hot-path which the
> > > > performance is
> > > > > > very important. (
> > > > > >
> > > > > >
> > > >
> > >
> https://github.com/apache/kafka/blob/240243b91d69c2b65b5e456065fdcce90c121b04/core/src/main/scala/kafka/security/authorizer/AclAuthorizer.scala#L539
> > > > > > ).
> > > > > > Do you have ideas how do we update `matchingAcls` to support
> > > > MATCH-type ACL
> > > > > > without introducing performance issue?
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > 2024年5月3日(金) 19:51 Claude Warren, Jr <[email protected]
> > > .invalid>:
> > > > > >
> > > > > > > As I wrote in [1], the ACL evaluation algorithm needs to be
> > > specified
> > > > > > with
> > > > > > > respect to the specificity of the pattern so that we know
> exactly
> > > > which
> > > > > > if
> > > > > > > *-accounts-* takes precedence over nl-accounts-* or visa versa.
> > > > > > >
> > > > > > > I think that we should spell out that precedence is evaluated
> as
> > > > follows:
> > > > > > >
> > > > > > > 1. Remove patterns that do not match
> > > > > > > 2. More specific patterns take precedence over less specific
> > > patterns
> > > > > > > 3. for patterns of the same precedence DENY overrides ALLOW
> > > > > > >
> > > > > > > Determining specificity:
> > > > > > >
> > > > > > > Specificity is based on the Levenshtein distance between the
> > > pattern
> > > > and
> > > > > > > the text being evaluated. The lower the distance the more
> specific
> > > > the
> > > > > > > rule.
> > > > > > > Using the topic name: nl-accounts-localtopic we can evaluate
> the
> > > > > > > Levenshtein distance for various patterns.
> > > > > > > nl-accounts-localtopic = 0
> > > > > > > *-accounts-localtopic = 2
> > > > > > > nl-accounts-local* = 5
> > > > > > > *-accounts-local* = 7
> > > > > > > nl-accounts-* = 10
> > > > > > > *-accounts-* = 12
> > > > > > >
> > > > > > > In the special case of matching principles User matches are
> more
> > > > specific
> > > > > > > than Group matches.
> > > > > > >
> > > > > > > I don't know if this should be added to KIP-1042 or presented
> as a
> > > > new
> > > > > > KIP.
> > > > > > >
> > > > > > > Claude
> > > > > > >
> > > > > > > [1]
> > > https://lists.apache.org/thread/0l88tkbxq3ol9rnx0ljnmswj5y6pho1f
> > > > > > > <
> > > > > > >
> > > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls
> > > > > > > >
> > > > > > >
> > > > > > > On Fri, May 3, 2024 at 12:18 PM Claude Warren <
> [email protected]>
> > > > wrote:
> > > > > > >
> > > > > > > > Took me awhile to find it but the link to the KIP is
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls
> > > > > > > >
> > > > > > > > On Fri, May 3, 2024 at 10:13 AM Murali Basani <
> > > > [email protected]
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hello,
> > > > > > > > >
> > > > > > > > > I'd like to propose a suggestion to our resource patterns
> in
> > > > Kafka
> > > > > > > ACLs.
> > > > > > > > >
> > > > > > > > > Currently, when adding new ACLs in Kafka, we have two
> types of
> > > > > > resource
> > > > > > > > > patterns for topics:
> > > > > > > > >
> > > > > > > > >    - LITERAL
> > > > > > > > >    - PREFIXED
> > > > > > > > >
> > > > > > > > > However, when it comes to listing or removing ACLs, we
> have a
> > > > couple
> > > > > > > more
> > > > > > > > > options:
> > > > > > > > >
> > > > > > > > >    - MATCH
> > > > > > > > >    - ANY (will match any pattern type)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > If we can extend creating acls as well with 'MATCH' pattern
> > > > type, it
> > > > > > > > would
> > > > > > > > > be very beneficial. Even though this kind of acl should be
> > > > created
> > > > > > with
> > > > > > > > > utmost care, it will help organizations streamline their
> ACL
> > > > > > management
> > > > > > > > > processes.
> > > > > > > > >
> > > > > > > > > Example scenarios :
> > > > > > > > >
> > > > > > > > > Let's say we need to create ACLs for the following six
> topics:
> > > > > > > > > nl-accounts-localtopic, nl-accounts-remotetopic,
> > > > > > > de-accounts-localtopic,
> > > > > > > > > de-accounts-remotetopic, cz-accounts-localtopic,
> > > > > > > cz-accounts-remotetopic
> > > > > > > > >
> > > > > > > > > Currently, we achieve this using existing functionality by
> > > > creating
> > > > > > > three
> > > > > > > > > prefixed ACLs as shown below:
> > > > > > > > >
> > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \
> > > > > > > > > > --add \
> > > > > > > > > > --allow-principal
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
> > > > > > > > > > \
> > > > > > > > > > --producer \
> > > > > > > > > > --topic nl-accounts- \
> > > > > > > > > > --resource-pattern-type prefixed
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \
> > > > > > > > > > --add \
> > > > > > > > > > --allow-principal
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
> > > > > > > > > > \
> > > > > > > > > > --producer \
> > > > > > > > > > --topic de-accounts- \
> > > > > > > > > > --resource-pattern-type prefixed
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \
> > > > > > > > > > --add \
> > > > > > > > > > --allow-principal
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
> > > > > > > > > > \
> > > > > > > > > > --producer \
> > > > > > > > > > --topic cz-accounts- \
> > > > > > > > > > --resource-pattern-type prefixed
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > However, if we had the 'MATCH' pattern type available, we
> could
> > > > > > > > accomplish
> > > > > > > > > this with a single ACL, as illustrated here:
> > > > > > > > >
> > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \
> > > > > > > > > > --add \
> > > > > > > > > > --allow-principal
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
> > > > > > > > > > \
> > > > > > > > > > --producer \
> > > > > > > > > > --topic *-accounts-* \
> > > > > > > > > > --resource-pattern-type match
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > This pattern closely resembles PREFIXED but offers broader
> > > > allow/deny
> > > > > > > > > rules.
> > > > > > > > >
> > > > > > > > > Implementing this change could significantly reduce the
> effort
> > > in
> > > > > > > several
> > > > > > > > > acl management processes.
> > > > > > > > >
> > > > > > > > > I welcome your thoughts and any concerns you may have
> regarding
> > > > this
> > > > > > > > > proposal.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Murali
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > LinkedIn: http://www.linkedin.com/in/claudewarren
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > ========================
> > > > > > Okada Haruki
> > > > > > [email protected]
> > > > > > ========================
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > LinkedIn: http://www.linkedin.com/in/claudewarren
> > > >
> > >
> > >
> > > --
> > > LinkedIn: http://www.linkedin.com/in/claudewarren
> > >
>

Re: [DISCUSS] KIP-1042 support for wildcard when creating new acls

Reply via email to