Re: [DISCUSS] KIP-1042 support for wildcard when creating new acls

Muralidhar Basani Mon, 03 Jun 2024 13:32:38 -0700

Hi,

Just bumping this thread again. It seems no concerns have been raised so
far.


I'll request votes in 2 weeks.

Thanks,
Murali

On Tue, May 28, 2024 at 1:24 PM Muralidhar Basani <
[email protected]> wrote:

> Hi all,
>
> Any other suggestions or objections to the proposal?
>
> Thanks,
> Murali
>
> On Thu, May 23, 2024 at 10:18 AM Muralidhar Basani <
> [email protected]> wrote:
>
>> Thanks Greg.
>> I have updated KIP with details on optimisation of LITERAL too.
>>
>> Regarding limitations in performance by introducing MATCH is definitely a
>> question.
>> - By optimizing LITERAL and PREFIX we are gaining a few nano secs I
>> think. (described in kip)
>> - MATCH can be introduced only with a configuration. So by default, we
>> can disable the MATCH check (ex : acls.pattern.match.enable=false), and if
>> customer enables the config, only then add the lookup in the matchingAcls()
>> - With the proposal already described in KIP for MATCH, we may not have
>> any limitations., rather will be efficient.
>>
>> Maybe we are in good shape with these propositions
>>
>> Thanks,
>> Murali
>>
>>
>>
>> On Tue, May 21, 2024 at 6:15 PM Greg Harris <[email protected]>
>> wrote:
>>
>>> Hi Murali,
>>>
>>> I don't have a trie library in mind. I looked at the current
>>> implementation of the StandardAuthorizer and found that we are already
>>> benefiting from the prefix structure in the implementation [1]. The
>>> current implementation appears to be a TreePSet [2].
>>>
>>> Now, we've already made this tradeoff once with PREFIX: Prefixes are
>>> less structured than literals, because with literals you can use a
>>> hashing algorithm to jump directly to your relevant ACLs in O(1), but
>>> with a prefix you either need to do multiple lookups, or some sort of
>>> O(log(n)) lookup. And we determined that the ultimate limitation in
>>> performance was worth it for the expressiveness.
>>> We're making this tradeoff again with MATCH acls: wildcards are less
>>> structured than prefixes or literals, because of the reasons I
>>> mentioned earlier. We need to judge now if the ultimate limitation in
>>> performance is worth it.
>>>
>>> I think your strategy for using the optimized layout for prefix and
>>> literal matches is smart, because there does seem to be a gap in
>>> performance possible. It makes me wonder why the optimized layout for
>>> literals was not used when prefixes were added. Literal lookups still
>>> go through the tree lookup, when they could be moved to a hash-lookup
>>> instead.
>>> That would allow users to "choose" for themselves on a convenience vs
>>> performance scale: Smaller use-cases can add a single convenient
>>> MATCH, and larger use-cases can add the multiple optimized PREFIXes.
>>>
>>> [1]
>>> https://github.com/apache/kafka/blob/9fe3932e5c110443f7fa545fcf0b8f78574f2f73/metadata/src/main/java/org/apache/kafka/metadata/authorizer/StandardAuthorizerData.java#L319-L339
>>> [2]
>>> https://github.com/apache/kafka/blob/9fe3932e5c110443f7fa545fcf0b8f78574f2f73/server-common/src/main/java/org/apache/kafka/server/immutable/pcollections/PCollectionsImmutableNavigableSet.java#L34
>>>
>>> Thanks,
>>> Greg
>>>
>>> On Tue, May 21, 2024 at 8:23 AM Muralidhar Basani
>>> <[email protected]> wrote:
>>> >
>>> > @greg which library of trie you were thinking of ? There is one in the
>>> > commons-collection I see.
>>> >
>>> > On Fri, May 17, 2024 at 3:55 PM Claude Warren <[email protected]>
>>> wrote:
>>> >
>>> > > >
>>> > > > This has implications for execution complexity: If we can't compute
>>> > > > whether two patterns overlap, then we need to run both of them on
>>> each
>>> > > > piece of input to test if they both match. Under the current
>>> > > > LITERAL/PREFIX system, we can optimize execution with a trie, but
>>> that
>>> > > > option wouldn't be available to us with MATCH.
>>> > > >
>>> > >
>>> > > If we consider the case of an asterisk representing 1 or more
>>> characters
>>> > > then determining if 2 patterns overlap can be fairly simple and very
>>> fast.
>>> > > Let's call the text from the ACL the target, and the text from the
>>> wildcard
>>> > > matcher the candidate.
>>> > >
>>> > > When a wildcard pattern, excluding '*',  is created:
>>> > >
>>> > >    - the candidate text is broken into fragments separated by
>>> characters
>>> > >    that are not Character.isLetterOrDigit() (See
>>> > >
>>> https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html).
>>> > >    - fragments that contain 1 character are ignored.
>>> > >    - fragments that contains 2 or more characters are scanned and
>>> every
>>> > >    every pair of characters used to create a Hasher (See
>>> > >
>>> > >
>>> https://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/bloomfilter/Hasher.html
>>> > > )
>>> > >    that hasher is added to a Bloom filter associated with the
>>> wildcard
>>> > > pattern.
>>> > >
>>> > > When a target is being evaluated and matching wildcard entries are
>>> to be
>>> > > located. Split and create a bloom filter using entry and the same
>>> strategy
>>> > > as for the wildcard patterns above.  These bloom filters will have
>>> had more
>>> > > pairs of characters added than for a matching wildcard pattern.
>>> > >
>>> > > Each filter contains the pattern for the unique pairs in the
>>> fragments of
>>> > > the original text:
>>> > >
>>> > > Now we can check the Bloom filters.
>>> > >
>>> > > To find potential matching patterns we can check to see if the Bloom
>>> filter
>>> > > for the pattern is contained within the bloom filter for the entry
>>> text.
>>> > > If so then we know that it is likely that the pairs of characters
>>> specified
>>> > > in the wild card (non-wildcard text) appear in the entry text.
>>> > >
>>> > > At this point we can evaluate the reduced number of patterns to see
>>> which
>>> > > ones actually match.
>>> > >
>>> > > Having reduced the candidates to the matching patterns we can then
>>> use a
>>> > > standard measure of similarity like the Levenshtein distance to
>>> determine
>>> > > which candidates require the fewest edits to evolve into the
>>> target.  The
>>> > > one with the fewest edits is the most specific and should be applied.
>>> > >
>>> > > Now if you want to know which patterns overlap you have a similar
>>> process.
>>> > >
>>> > > Each Bloom filter can calculate the estimated number of unique
>>> fragments
>>> > > that were added to it.  If the filters are sorted by this
>>> estimation, the
>>> > > ones with the highest counts may contain the ones with the lowest
>>> counts,
>>> > > but a lower count can never contain a higher count.  The calculation
>>> to
>>> > > perform the estimation can be skipped and the cardinality value of
>>> each
>>> > > Bloom filter can be used instead.  You can then check the smaller
>>> filters
>>> > > against the larger ones and find where one candidate is contained
>>> within
>>> > > another.
>>> > >
>>> > > If you want to know if they intersect the BloomFilter from
>>> > > commons-collections has an estimateIntersection as well as an
>>> estimateUnion
>>> > > method.  Quick tests can be made between filters to see if there is
>>> any
>>> > > overlap before more complex analysis is performed.
>>> > >
>>> > > The solution here does not make the final comparisons easier, it
>>> simply
>>> > > reduces the search space to find the items that need to be compared.
>>> > >
>>> > >
>>> > > On Mon, May 6, 2024 at 9:03 PM Greg Harris
>>> <[email protected]>
>>> > > wrote:
>>> > >
>>> > > > Hi Murali,
>>> > > >
>>> > > > Thanks for the KIP!
>>> > > >
>>> > > > I think I understand the motivation for this KIP in situations
>>> where
>>> > > > there are a "cross product" of topics for two or more variables X
>>> and
>>> > > > Y, and want to write ACLs for each of the variable axes.
>>> > > > If you format your topics "X-Y-suffix", it's not easy to write
>>> rules
>>> > > > that apply to all "Y" topics, because you need to enumerate all of
>>> the
>>> > > > "X" values, and the problem persists even if you reorder the topic
>>> > > > name.
>>> > > >
>>> > > > In my recent work on KIP-986 I found it necessary to introduce
>>> > > > "namespaces" to group topics together, and I was going to replicate
>>> > > > the ACL system to specify those namespaces. This change to the ACL
>>> > > > system could increase the expressiveness and complexity of that
>>> > > > feature, if it is ever implemented.
>>> > > > One of the primitives I needed when specifying namespaces was the
>>> > > > ability to tell when two namespaces overlapped (i.e. does there
>>> exist
>>> > > > any topic which is present in both namespaces). This is trivial to
>>> do
>>> > > > with the current PREFIX and LITERAL system, as we can find the
>>> > > > maximum-length common prefix with just some length comparisons and
>>> > > > equality checks.
>>> > > > I considered specifying namespaces via regular expressions, and
>>> found
>>> > > > that it was computationally much more difficult. Computing the
>>> > > > intersection of two regexes appears to be exponential in the
>>> length of
>>> > > > the regexes, leading me to avoid adding it.
>>> > > >
>>> > > > I understand that you're not suggesting full REGEX support, and
>>> that
>>> > > > "namespaces" don't need to support MATCH, but I think MATCH may run
>>> > > > into related difficulties. Any MATCH can overlap with any other
>>> MATCH
>>> > > > or PREFIX if it includes a sufficient number of wildcards. For
>>> > > > example:
>>> > > > MATCH *-accounts-* has overlap with PREFIX nl as they can both
>>> match
>>> > > > "nl-accounts-localtopic", but that isn't sensitive to the contents
>>> > > > "nl", it is true for any PREFIX.
>>> > > > MATCH *-accounts-* has overlap with MATCH *localtopic, as they can
>>> > > > both match "nl-accounts-localtopic", but that isn't actually
>>> sensitive
>>> > > > to the contents "localtopic", it's true for any MATCH which
>>> includes a
>>> > > > wildcard at the beginning.
>>> > > >
>>> > > > This has implications for execution complexity: If we can't compute
>>> > > > whether two patterns overlap, then we need to run both of them on
>>> each
>>> > > > piece of input to test if they both match. Under the current
>>> > > > LITERAL/PREFIX system, we can optimize execution with a trie, but
>>> that
>>> > > > option wouldn't be available to us with MATCH.
>>> > > >
>>> > > > The current system makes users evaluate a trade-off:
>>> > > > 1. Optimize the number of ACLs by organizing topics according to
>>> > > > prefixes (for example, "accounts-localtopic-nl" and PREFIX
>>> "accounts",
>>> > > > PREFIX "accounts-localtopic")
>>> > > > 2. Use less-structured topic names, with a corresponding ACL scheme
>>> > > > that has more individual rules.
>>> > > > The system currently informs users of this tradeoff by making them
>>> > > > write multiple ACLs, and making them think "there has got to be a
>>> > > > better way!". Perhaps we can find a better way to surface this best
>>> > > > practice, or better inform users about it.
>>> > > >
>>> > > > I understand that there are going to be situations more complex
>>> than
>>> > > > your example, where multiple individual rules will always be
>>> necessary
>>> > > > with only PREFIX evaluation. I think even in those situations, a
>>> > > > number of efficient-to-evaluate rules is preferable to just one
>>> > > > expensive-to-evaluate rule.
>>> > > >
>>> > > > One alternative that I thought of could be "PARAMETERIZED" ACLs
>>> which
>>> > > > are like PREFIXED, but allow some parameter substitution. For
>>> example
>>> > > > PARAMETERIZED "(nl|de|cz)-accounts-". I'm lifting regex syntax
>>> here,
>>> > > > but this isn't actually a regex, and wouldn't allow arbitrary
>>> numbers
>>> > > > of characters, or the * or + operators.
>>> > > > In the background it could evaluate exactly like the 3 individual
>>> > > > PREFIX rules, but be easier to evaluate on the backend, and support
>>> > > > the intersection query I mentioned earlier. It could also support
>>> > > > [a-zA-Z] notation in case the parameter values aren't known ahead
>>> of
>>> > > > time, but have a fixed length.
>>> > > >
>>> > > > Thanks,
>>> > > > Greg
>>> > > >
>>> > > > On Mon, May 6, 2024 at 11:17 AM Claude Warren <[email protected]>
>>> wrote:
>>> > > > >
>>> > > > > I have an idea for how to reduce the time for ACL lookups in
>>> general
>>> > > and
>>> > > > > particularly where wildcards are involved using sequence
>>> > > > > characterization techniques from bioinformatics.  But I need a
>>> set of
>>> > > ACL
>>> > > > > patterns and associated topics to test with.
>>> > > > >
>>> > > > > On Fri, May 3, 2024 at 2:45 PM Haruki Okada <[email protected]
>>> >
>>> > > wrote:
>>> > > > >
>>> > > > > > Hi, Murali.
>>> > > > > >
>>> > > > > > First, could you add the KIP-1042 to the index (
>>> > > > > >
>>> > > > > >
>>> > > >
>>> > >
>>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
>>> > > > > > )
>>> > > > > > as well so that everyone can find it easily?
>>> > > > > >
>>> > > > > > I took a look at the KIP, then I have 2 questions:
>>> > > > > >
>>> > > > > > 1. Though the new MATCH resource pattern type may reduce the
>>> effort
>>> > > of
>>> > > > > > adding ACLs in some cases, do you have any concrete use case
>>> you are
>>> > > in
>>> > > > > > mind? (When prefixed ACL was introduced in KIP-290, there was a
>>> > > > use-case
>>> > > > > > that using it for implementing multi-tenancy)
>>> > > > > >
>>> > > > > > 2. As you may know, ACL lookup is in the hot-path which the
>>> > > > performance is
>>> > > > > > very important. (
>>> > > > > >
>>> > > > > >
>>> > > >
>>> > >
>>> https://github.com/apache/kafka/blob/240243b91d69c2b65b5e456065fdcce90c121b04/core/src/main/scala/kafka/security/authorizer/AclAuthorizer.scala#L539
>>> > > > > > ).
>>> > > > > > Do you have ideas how do we update `matchingAcls` to support
>>> > > > MATCH-type ACL
>>> > > > > > without introducing performance issue?
>>> > > > > >
>>> > > > > >
>>> > > > > > Thanks,
>>> > > > > >
>>> > > > > > 2024年5月3日(金) 19:51 Claude Warren, Jr <[email protected]
>>> > > .invalid>:
>>> > > > > >
>>> > > > > > > As I wrote in [1], the ACL evaluation algorithm needs to be
>>> > > specified
>>> > > > > > with
>>> > > > > > > respect to the specificity of the pattern so that we know
>>> exactly
>>> > > > which
>>> > > > > > if
>>> > > > > > > *-accounts-* takes precedence over nl-accounts-* or visa
>>> versa.
>>> > > > > > >
>>> > > > > > > I think that we should spell out that precedence is
>>> evaluated as
>>> > > > follows:
>>> > > > > > >
>>> > > > > > > 1. Remove patterns that do not match
>>> > > > > > > 2. More specific patterns take precedence over less specific
>>> > > patterns
>>> > > > > > > 3. for patterns of the same precedence DENY overrides ALLOW
>>> > > > > > >
>>> > > > > > > Determining specificity:
>>> > > > > > >
>>> > > > > > > Specificity is based on the Levenshtein distance between the
>>> > > pattern
>>> > > > and
>>> > > > > > > the text being evaluated. The lower the distance the more
>>> specific
>>> > > > the
>>> > > > > > > rule.
>>> > > > > > > Using the topic name: nl-accounts-localtopic we can evaluate
>>> the
>>> > > > > > > Levenshtein distance for various patterns.
>>> > > > > > > nl-accounts-localtopic = 0
>>> > > > > > > *-accounts-localtopic = 2
>>> > > > > > > nl-accounts-local* = 5
>>> > > > > > > *-accounts-local* = 7
>>> > > > > > > nl-accounts-* = 10
>>> > > > > > > *-accounts-* = 12
>>> > > > > > >
>>> > > > > > > In the special case of matching principles User matches are
>>> more
>>> > > > specific
>>> > > > > > > than Group matches.
>>> > > > > > >
>>> > > > > > > I don't know if this should be added to KIP-1042 or
>>> presented as a
>>> > > > new
>>> > > > > > KIP.
>>> > > > > > >
>>> > > > > > > Claude
>>> > > > > > >
>>> > > > > > > [1]
>>> > > https://lists.apache.org/thread/0l88tkbxq3ol9rnx0ljnmswj5y6pho1f
>>> > > > > > > <
>>> > > > > > >
>>> > > > > >
>>> > > >
>>> > >
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls
>>> > > > > > > >
>>> > > > > > >
>>> > > > > > > On Fri, May 3, 2024 at 12:18 PM Claude Warren <
>>> [email protected]>
>>> > > > wrote:
>>> > > > > > >
>>> > > > > > > > Took me awhile to find it but the link to the KIP is
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > >
>>> > >
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls
>>> > > > > > > >
>>> > > > > > > > On Fri, May 3, 2024 at 10:13 AM Murali Basani <
>>> > > > [email protected]
>>> > > > > > >
>>> > > > > > > > wrote:
>>> > > > > > > >
>>> > > > > > > > > Hello,
>>> > > > > > > > >
>>> > > > > > > > > I'd like to propose a suggestion to our resource
>>> patterns in
>>> > > > Kafka
>>> > > > > > > ACLs.
>>> > > > > > > > >
>>> > > > > > > > > Currently, when adding new ACLs in Kafka, we have two
>>> types of
>>> > > > > > resource
>>> > > > > > > > > patterns for topics:
>>> > > > > > > > >
>>> > > > > > > > >    - LITERAL
>>> > > > > > > > >    - PREFIXED
>>> > > > > > > > >
>>> > > > > > > > > However, when it comes to listing or removing ACLs, we
>>> have a
>>> > > > couple
>>> > > > > > > more
>>> > > > > > > > > options:
>>> > > > > > > > >
>>> > > > > > > > >    - MATCH
>>> > > > > > > > >    - ANY (will match any pattern type)
>>> > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > > > If we can extend creating acls as well with 'MATCH'
>>> pattern
>>> > > > type, it
>>> > > > > > > > would
>>> > > > > > > > > be very beneficial. Even though this kind of acl should
>>> be
>>> > > > created
>>> > > > > > with
>>> > > > > > > > > utmost care, it will help organizations streamline their
>>> ACL
>>> > > > > > management
>>> > > > > > > > > processes.
>>> > > > > > > > >
>>> > > > > > > > > Example scenarios :
>>> > > > > > > > >
>>> > > > > > > > > Let's say we need to create ACLs for the following six
>>> topics:
>>> > > > > > > > > nl-accounts-localtopic, nl-accounts-remotetopic,
>>> > > > > > > de-accounts-localtopic,
>>> > > > > > > > > de-accounts-remotetopic, cz-accounts-localtopic,
>>> > > > > > > cz-accounts-remotetopic
>>> > > > > > > > >
>>> > > > > > > > > Currently, we achieve this using existing functionality
>>> by
>>> > > > creating
>>> > > > > > > three
>>> > > > > > > > > prefixed ACLs as shown below:
>>> > > > > > > > >
>>> > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \
>>> > > > > > > > > > --add \
>>> > > > > > > > > > --allow-principal
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > >
>>> > >
>>> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
>>> > > > > > > > > > \
>>> > > > > > > > > > --producer \
>>> > > > > > > > > > --topic nl-accounts- \
>>> > > > > > > > > > --resource-pattern-type prefixed
>>> > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \
>>> > > > > > > > > > --add \
>>> > > > > > > > > > --allow-principal
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > >
>>> > >
>>> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
>>> > > > > > > > > > \
>>> > > > > > > > > > --producer \
>>> > > > > > > > > > --topic de-accounts- \
>>> > > > > > > > > > --resource-pattern-type prefixed
>>> > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \
>>> > > > > > > > > > --add \
>>> > > > > > > > > > --allow-principal
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > >
>>> > >
>>> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
>>> > > > > > > > > > \
>>> > > > > > > > > > --producer \
>>> > > > > > > > > > --topic cz-accounts- \
>>> > > > > > > > > > --resource-pattern-type prefixed
>>> > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > > > However, if we had the 'MATCH' pattern type available,
>>> we could
>>> > > > > > > > accomplish
>>> > > > > > > > > this with a single ACL, as illustrated here:
>>> > > > > > > > >
>>> > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \
>>> > > > > > > > > > --add \
>>> > > > > > > > > > --allow-principal
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > >
>>> > >
>>> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
>>> > > > > > > > > > \
>>> > > > > > > > > > --producer \
>>> > > > > > > > > > --topic *-accounts-* \
>>> > > > > > > > > > --resource-pattern-type match
>>> > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > > > This pattern closely resembles PREFIXED but offers
>>> broader
>>> > > > allow/deny
>>> > > > > > > > > rules.
>>> > > > > > > > >
>>> > > > > > > > > Implementing this change could significantly reduce the
>>> effort
>>> > > in
>>> > > > > > > several
>>> > > > > > > > > acl management processes.
>>> > > > > > > > >
>>> > > > > > > > > I welcome your thoughts and any concerns you may have
>>> regarding
>>> > > > this
>>> > > > > > > > > proposal.
>>> > > > > > > > >
>>> > > > > > > > > Thanks,
>>> > > > > > > > > Murali
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > > --
>>> > > > > > > > LinkedIn: http://www.linkedin.com/in/claudewarren
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > > --
>>> > > > > > ========================
>>> > > > > > Okada Haruki
>>> > > > > > [email protected]
>>> > > > > > ========================
>>> > > > > >
>>> > > > >
>>> > > > >
>>> > > > > --
>>> > > > > LinkedIn: http://www.linkedin.com/in/claudewarren
>>> > > >
>>> > >
>>> > >
>>> > > --
>>> > > LinkedIn: http://www.linkedin.com/in/claudewarren
>>> > >
>>>
>>

Re: [DISCUSS] KIP-1042 support for wildcard when creating new acls

Reply via email to