Re: [DISCUSS] KIP-1042 support for wildcard when creating new acls

Muralidhar Basani Tue, 28 May 2024 04:25:35 -0700

Hi all,

Any other suggestions or objections to the proposal?


Thanks,
Murali

On Thu, May 23, 2024 at 10:18 AM Muralidhar Basani <
[email protected]> wrote:

> Thanks Greg.
> I have updated KIP with details on optimisation of LITERAL too.
>
> Regarding limitations in performance by introducing MATCH is definitely a
> question.
> - By optimizing LITERAL and PREFIX we are gaining a few nano secs I think.
> (described in kip)
> - MATCH can be introduced only with a configuration. So by default, we can
> disable the MATCH check (ex : acls.pattern.match.enable=false), and if
> customer enables the config, only then add the lookup in the matchingAcls()
> - With the proposal already described in KIP for MATCH, we may not have
> any limitations., rather will be efficient.
>
> Maybe we are in good shape with these propositions
>
> Thanks,
> Murali
>
>
>
> On Tue, May 21, 2024 at 6:15 PM Greg Harris <[email protected]>
> wrote:
>
>> Hi Murali,
>>
>> I don't have a trie library in mind. I looked at the current
>> implementation of the StandardAuthorizer and found that we are already
>> benefiting from the prefix structure in the implementation [1]. The
>> current implementation appears to be a TreePSet [2].
>>
>> Now, we've already made this tradeoff once with PREFIX: Prefixes are
>> less structured than literals, because with literals you can use a
>> hashing algorithm to jump directly to your relevant ACLs in O(1), but
>> with a prefix you either need to do multiple lookups, or some sort of
>> O(log(n)) lookup. And we determined that the ultimate limitation in
>> performance was worth it for the expressiveness.
>> We're making this tradeoff again with MATCH acls: wildcards are less
>> structured than prefixes or literals, because of the reasons I
>> mentioned earlier. We need to judge now if the ultimate limitation in
>> performance is worth it.
>>
>> I think your strategy for using the optimized layout for prefix and
>> literal matches is smart, because there does seem to be a gap in
>> performance possible. It makes me wonder why the optimized layout for
>> literals was not used when prefixes were added. Literal lookups still
>> go through the tree lookup, when they could be moved to a hash-lookup
>> instead.
>> That would allow users to "choose" for themselves on a convenience vs
>> performance scale: Smaller use-cases can add a single convenient
>> MATCH, and larger use-cases can add the multiple optimized PREFIXes.
>>
>> [1]
>> https://github.com/apache/kafka/blob/9fe3932e5c110443f7fa545fcf0b8f78574f2f73/metadata/src/main/java/org/apache/kafka/metadata/authorizer/StandardAuthorizerData.java#L319-L339
>> [2]
>> https://github.com/apache/kafka/blob/9fe3932e5c110443f7fa545fcf0b8f78574f2f73/server-common/src/main/java/org/apache/kafka/server/immutable/pcollections/PCollectionsImmutableNavigableSet.java#L34
>>
>> Thanks,
>> Greg
>>
>> On Tue, May 21, 2024 at 8:23 AM Muralidhar Basani
>> <[email protected]> wrote:
>> >
>> > @greg which library of trie you were thinking of ? There is one in the
>> > commons-collection I see.
>> >
>> > On Fri, May 17, 2024 at 3:55 PM Claude Warren <[email protected]> wrote:
>> >
>> > > >
>> > > > This has implications for execution complexity: If we can't compute
>> > > > whether two patterns overlap, then we need to run both of them on
>> each
>> > > > piece of input to test if they both match. Under the current
>> > > > LITERAL/PREFIX system, we can optimize execution with a trie, but
>> that
>> > > > option wouldn't be available to us with MATCH.
>> > > >
>> > >
>> > > If we consider the case of an asterisk representing 1 or more
>> characters
>> > > then determining if 2 patterns overlap can be fairly simple and very
>> fast.
>> > > Let's call the text from the ACL the target, and the text from the
>> wildcard
>> > > matcher the candidate.
>> > >
>> > > When a wildcard pattern, excluding '*',  is created:
>> > >
>> > >    - the candidate text is broken into fragments separated by
>> characters
>> > >    that are not Character.isLetterOrDigit() (See
>> > >    https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html
>> ).
>> > >    - fragments that contain 1 character are ignored.
>> > >    - fragments that contains 2 or more characters are scanned and
>> every
>> > >    every pair of characters used to create a Hasher (See
>> > >
>> > >
>> https://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/bloomfilter/Hasher.html
>> > > )
>> > >    that hasher is added to a Bloom filter associated with the wildcard
>> > > pattern.
>> > >
>> > > When a target is being evaluated and matching wildcard entries are to
>> be
>> > > located. Split and create a bloom filter using entry and the same
>> strategy
>> > > as for the wildcard patterns above.  These bloom filters will have
>> had more
>> > > pairs of characters added than for a matching wildcard pattern.
>> > >
>> > > Each filter contains the pattern for the unique pairs in the
>> fragments of
>> > > the original text:
>> > >
>> > > Now we can check the Bloom filters.
>> > >
>> > > To find potential matching patterns we can check to see if the Bloom
>> filter
>> > > for the pattern is contained within the bloom filter for the entry
>> text.
>> > > If so then we know that it is likely that the pairs of characters
>> specified
>> > > in the wild card (non-wildcard text) appear in the entry text.
>> > >
>> > > At this point we can evaluate the reduced number of patterns to see
>> which
>> > > ones actually match.
>> > >
>> > > Having reduced the candidates to the matching patterns we can then
>> use a
>> > > standard measure of similarity like the Levenshtein distance to
>> determine
>> > > which candidates require the fewest edits to evolve into the target.
>> The
>> > > one with the fewest edits is the most specific and should be applied.
>> > >
>> > > Now if you want to know which patterns overlap you have a similar
>> process.
>> > >
>> > > Each Bloom filter can calculate the estimated number of unique
>> fragments
>> > > that were added to it.  If the filters are sorted by this estimation,
>> the
>> > > ones with the highest counts may contain the ones with the lowest
>> counts,
>> > > but a lower count can never contain a higher count.  The calculation
>> to
>> > > perform the estimation can be skipped and the cardinality value of
>> each
>> > > Bloom filter can be used instead.  You can then check the smaller
>> filters
>> > > against the larger ones and find where one candidate is contained
>> within
>> > > another.
>> > >
>> > > If you want to know if they intersect the BloomFilter from
>> > > commons-collections has an estimateIntersection as well as an
>> estimateUnion
>> > > method.  Quick tests can be made between filters to see if there is
>> any
>> > > overlap before more complex analysis is performed.
>> > >
>> > > The solution here does not make the final comparisons easier, it
>> simply
>> > > reduces the search space to find the items that need to be compared.
>> > >
>> > >
>> > > On Mon, May 6, 2024 at 9:03 PM Greg Harris
>> <[email protected]>
>> > > wrote:
>> > >
>> > > > Hi Murali,
>> > > >
>> > > > Thanks for the KIP!
>> > > >
>> > > > I think I understand the motivation for this KIP in situations where
>> > > > there are a "cross product" of topics for two or more variables X
>> and
>> > > > Y, and want to write ACLs for each of the variable axes.
>> > > > If you format your topics "X-Y-suffix", it's not easy to write rules
>> > > > that apply to all "Y" topics, because you need to enumerate all of
>> the
>> > > > "X" values, and the problem persists even if you reorder the topic
>> > > > name.
>> > > >
>> > > > In my recent work on KIP-986 I found it necessary to introduce
>> > > > "namespaces" to group topics together, and I was going to replicate
>> > > > the ACL system to specify those namespaces. This change to the ACL
>> > > > system could increase the expressiveness and complexity of that
>> > > > feature, if it is ever implemented.
>> > > > One of the primitives I needed when specifying namespaces was the
>> > > > ability to tell when two namespaces overlapped (i.e. does there
>> exist
>> > > > any topic which is present in both namespaces). This is trivial to
>> do
>> > > > with the current PREFIX and LITERAL system, as we can find the
>> > > > maximum-length common prefix with just some length comparisons and
>> > > > equality checks.
>> > > > I considered specifying namespaces via regular expressions, and
>> found
>> > > > that it was computationally much more difficult. Computing the
>> > > > intersection of two regexes appears to be exponential in the length
>> of
>> > > > the regexes, leading me to avoid adding it.
>> > > >
>> > > > I understand that you're not suggesting full REGEX support, and that
>> > > > "namespaces" don't need to support MATCH, but I think MATCH may run
>> > > > into related difficulties. Any MATCH can overlap with any other
>> MATCH
>> > > > or PREFIX if it includes a sufficient number of wildcards. For
>> > > > example:
>> > > > MATCH *-accounts-* has overlap with PREFIX nl as they can both match
>> > > > "nl-accounts-localtopic", but that isn't sensitive to the contents
>> > > > "nl", it is true for any PREFIX.
>> > > > MATCH *-accounts-* has overlap with MATCH *localtopic, as they can
>> > > > both match "nl-accounts-localtopic", but that isn't actually
>> sensitive
>> > > > to the contents "localtopic", it's true for any MATCH which
>> includes a
>> > > > wildcard at the beginning.
>> > > >
>> > > > This has implications for execution complexity: If we can't compute
>> > > > whether two patterns overlap, then we need to run both of them on
>> each
>> > > > piece of input to test if they both match. Under the current
>> > > > LITERAL/PREFIX system, we can optimize execution with a trie, but
>> that
>> > > > option wouldn't be available to us with MATCH.
>> > > >
>> > > > The current system makes users evaluate a trade-off:
>> > > > 1. Optimize the number of ACLs by organizing topics according to
>> > > > prefixes (for example, "accounts-localtopic-nl" and PREFIX
>> "accounts",
>> > > > PREFIX "accounts-localtopic")
>> > > > 2. Use less-structured topic names, with a corresponding ACL scheme
>> > > > that has more individual rules.
>> > > > The system currently informs users of this tradeoff by making them
>> > > > write multiple ACLs, and making them think "there has got to be a
>> > > > better way!". Perhaps we can find a better way to surface this best
>> > > > practice, or better inform users about it.
>> > > >
>> > > > I understand that there are going to be situations more complex than
>> > > > your example, where multiple individual rules will always be
>> necessary
>> > > > with only PREFIX evaluation. I think even in those situations, a
>> > > > number of efficient-to-evaluate rules is preferable to just one
>> > > > expensive-to-evaluate rule.
>> > > >
>> > > > One alternative that I thought of could be "PARAMETERIZED" ACLs
>> which
>> > > > are like PREFIXED, but allow some parameter substitution. For
>> example
>> > > > PARAMETERIZED "(nl|de|cz)-accounts-". I'm lifting regex syntax here,
>> > > > but this isn't actually a regex, and wouldn't allow arbitrary
>> numbers
>> > > > of characters, or the * or + operators.
>> > > > In the background it could evaluate exactly like the 3 individual
>> > > > PREFIX rules, but be easier to evaluate on the backend, and support
>> > > > the intersection query I mentioned earlier. It could also support
>> > > > [a-zA-Z] notation in case the parameter values aren't known ahead of
>> > > > time, but have a fixed length.
>> > > >
>> > > > Thanks,
>> > > > Greg
>> > > >
>> > > > On Mon, May 6, 2024 at 11:17 AM Claude Warren <[email protected]>
>> wrote:
>> > > > >
>> > > > > I have an idea for how to reduce the time for ACL lookups in
>> general
>> > > and
>> > > > > particularly where wildcards are involved using sequence
>> > > > > characterization techniques from bioinformatics.  But I need a
>> set of
>> > > ACL
>> > > > > patterns and associated topics to test with.
>> > > > >
>> > > > > On Fri, May 3, 2024 at 2:45 PM Haruki Okada <[email protected]>
>> > > wrote:
>> > > > >
>> > > > > > Hi, Murali.
>> > > > > >
>> > > > > > First, could you add the KIP-1042 to the index (
>> > > > > >
>> > > > > >
>> > > >
>> > >
>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
>> > > > > > )
>> > > > > > as well so that everyone can find it easily?
>> > > > > >
>> > > > > > I took a look at the KIP, then I have 2 questions:
>> > > > > >
>> > > > > > 1. Though the new MATCH resource pattern type may reduce the
>> effort
>> > > of
>> > > > > > adding ACLs in some cases, do you have any concrete use case
>> you are
>> > > in
>> > > > > > mind? (When prefixed ACL was introduced in KIP-290, there was a
>> > > > use-case
>> > > > > > that using it for implementing multi-tenancy)
>> > > > > >
>> > > > > > 2. As you may know, ACL lookup is in the hot-path which the
>> > > > performance is
>> > > > > > very important. (
>> > > > > >
>> > > > > >
>> > > >
>> > >
>> https://github.com/apache/kafka/blob/240243b91d69c2b65b5e456065fdcce90c121b04/core/src/main/scala/kafka/security/authorizer/AclAuthorizer.scala#L539
>> > > > > > ).
>> > > > > > Do you have ideas how do we update `matchingAcls` to support
>> > > > MATCH-type ACL
>> > > > > > without introducing performance issue?
>> > > > > >
>> > > > > >
>> > > > > > Thanks,
>> > > > > >
>> > > > > > 2024年5月3日(金) 19:51 Claude Warren, Jr <[email protected]
>> > > .invalid>:
>> > > > > >
>> > > > > > > As I wrote in [1], the ACL evaluation algorithm needs to be
>> > > specified
>> > > > > > with
>> > > > > > > respect to the specificity of the pattern so that we know
>> exactly
>> > > > which
>> > > > > > if
>> > > > > > > *-accounts-* takes precedence over nl-accounts-* or visa
>> versa.
>> > > > > > >
>> > > > > > > I think that we should spell out that precedence is evaluated
>> as
>> > > > follows:
>> > > > > > >
>> > > > > > > 1. Remove patterns that do not match
>> > > > > > > 2. More specific patterns take precedence over less specific
>> > > patterns
>> > > > > > > 3. for patterns of the same precedence DENY overrides ALLOW
>> > > > > > >
>> > > > > > > Determining specificity:
>> > > > > > >
>> > > > > > > Specificity is based on the Levenshtein distance between the
>> > > pattern
>> > > > and
>> > > > > > > the text being evaluated. The lower the distance the more
>> specific
>> > > > the
>> > > > > > > rule.
>> > > > > > > Using the topic name: nl-accounts-localtopic we can evaluate
>> the
>> > > > > > > Levenshtein distance for various patterns.
>> > > > > > > nl-accounts-localtopic = 0
>> > > > > > > *-accounts-localtopic = 2
>> > > > > > > nl-accounts-local* = 5
>> > > > > > > *-accounts-local* = 7
>> > > > > > > nl-accounts-* = 10
>> > > > > > > *-accounts-* = 12
>> > > > > > >
>> > > > > > > In the special case of matching principles User matches are
>> more
>> > > > specific
>> > > > > > > than Group matches.
>> > > > > > >
>> > > > > > > I don't know if this should be added to KIP-1042 or presented
>> as a
>> > > > new
>> > > > > > KIP.
>> > > > > > >
>> > > > > > > Claude
>> > > > > > >
>> > > > > > > [1]
>> > > https://lists.apache.org/thread/0l88tkbxq3ol9rnx0ljnmswj5y6pho1f
>> > > > > > > <
>> > > > > > >
>> > > > > >
>> > > >
>> > >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls
>> > > > > > > >
>> > > > > > >
>> > > > > > > On Fri, May 3, 2024 at 12:18 PM Claude Warren <
>> [email protected]>
>> > > > wrote:
>> > > > > > >
>> > > > > > > > Took me awhile to find it but the link to the KIP is
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > >
>> > >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls
>> > > > > > > >
>> > > > > > > > On Fri, May 3, 2024 at 10:13 AM Murali Basani <
>> > > > [email protected]
>> > > > > > >
>> > > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Hello,
>> > > > > > > > >
>> > > > > > > > > I'd like to propose a suggestion to our resource patterns
>> in
>> > > > Kafka
>> > > > > > > ACLs.
>> > > > > > > > >
>> > > > > > > > > Currently, when adding new ACLs in Kafka, we have two
>> types of
>> > > > > > resource
>> > > > > > > > > patterns for topics:
>> > > > > > > > >
>> > > > > > > > >    - LITERAL
>> > > > > > > > >    - PREFIXED
>> > > > > > > > >
>> > > > > > > > > However, when it comes to listing or removing ACLs, we
>> have a
>> > > > couple
>> > > > > > > more
>> > > > > > > > > options:
>> > > > > > > > >
>> > > > > > > > >    - MATCH
>> > > > > > > > >    - ANY (will match any pattern type)
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > If we can extend creating acls as well with 'MATCH'
>> pattern
>> > > > type, it
>> > > > > > > > would
>> > > > > > > > > be very beneficial. Even though this kind of acl should be
>> > > > created
>> > > > > > with
>> > > > > > > > > utmost care, it will help organizations streamline their
>> ACL
>> > > > > > management
>> > > > > > > > > processes.
>> > > > > > > > >
>> > > > > > > > > Example scenarios :
>> > > > > > > > >
>> > > > > > > > > Let's say we need to create ACLs for the following six
>> topics:
>> > > > > > > > > nl-accounts-localtopic, nl-accounts-remotetopic,
>> > > > > > > de-accounts-localtopic,
>> > > > > > > > > de-accounts-remotetopic, cz-accounts-localtopic,
>> > > > > > > cz-accounts-remotetopic
>> > > > > > > > >
>> > > > > > > > > Currently, we achieve this using existing functionality by
>> > > > creating
>> > > > > > > three
>> > > > > > > > > prefixed ACLs as shown below:
>> > > > > > > > >
>> > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \
>> > > > > > > > > > --add \
>> > > > > > > > > > --allow-principal
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > >
>> > >
>> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
>> > > > > > > > > > \
>> > > > > > > > > > --producer \
>> > > > > > > > > > --topic nl-accounts- \
>> > > > > > > > > > --resource-pattern-type prefixed
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \
>> > > > > > > > > > --add \
>> > > > > > > > > > --allow-principal
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > >
>> > >
>> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
>> > > > > > > > > > \
>> > > > > > > > > > --producer \
>> > > > > > > > > > --topic de-accounts- \
>> > > > > > > > > > --resource-pattern-type prefixed
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \
>> > > > > > > > > > --add \
>> > > > > > > > > > --allow-principal
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > >
>> > >
>> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
>> > > > > > > > > > \
>> > > > > > > > > > --producer \
>> > > > > > > > > > --topic cz-accounts- \
>> > > > > > > > > > --resource-pattern-type prefixed
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > However, if we had the 'MATCH' pattern type available, we
>> could
>> > > > > > > > accomplish
>> > > > > > > > > this with a single ACL, as illustrated here:
>> > > > > > > > >
>> > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \
>> > > > > > > > > > --add \
>> > > > > > > > > > --allow-principal
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > >
>> > >
>> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
>> > > > > > > > > > \
>> > > > > > > > > > --producer \
>> > > > > > > > > > --topic *-accounts-* \
>> > > > > > > > > > --resource-pattern-type match
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > This pattern closely resembles PREFIXED but offers broader
>> > > > allow/deny
>> > > > > > > > > rules.
>> > > > > > > > >
>> > > > > > > > > Implementing this change could significantly reduce the
>> effort
>> > > in
>> > > > > > > several
>> > > > > > > > > acl management processes.
>> > > > > > > > >
>> > > > > > > > > I welcome your thoughts and any concerns you may have
>> regarding
>> > > > this
>> > > > > > > > > proposal.
>> > > > > > > > >
>> > > > > > > > > Thanks,
>> > > > > > > > > Murali
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > --
>> > > > > > > > LinkedIn: http://www.linkedin.com/in/claudewarren
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > ========================
>> > > > > > Okada Haruki
>> > > > > > [email protected]
>> > > > > > ========================
>> > > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > LinkedIn: http://www.linkedin.com/in/claudewarren
>> > > >
>> > >
>> > >
>> > > --
>> > > LinkedIn: http://www.linkedin.com/in/claudewarren
>> > >
>>
>

Re: [DISCUSS] KIP-1042 support for wildcard when creating new acls

Reply via email to