>
> This has implications for execution complexity: If we can't compute
> whether two patterns overlap, then we need to run both of them on each
> piece of input to test if they both match. Under the current
> LITERAL/PREFIX system, we can optimize execution with a trie, but that
> option wouldn't be available to us with MATCH.
>

If we consider the case of an asterisk representing 1 or more characters
then determining if 2 patterns overlap can be fairly simple and very fast.
Let's call the text from the ACL the target, and the text from the wildcard
matcher the candidate.

When a wildcard pattern, excluding '*',  is created:

   - the candidate text is broken into fragments separated by characters
   that are not Character.isLetterOrDigit() (See
   https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html).
   - fragments that contain 1 character are ignored.
   - fragments that contains 2 or more characters are scanned and every
   every pair of characters used to create a Hasher (See
   
https://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/bloomfilter/Hasher.html)
   that hasher is added to a Bloom filter associated with the wildcard pattern.

When a target is being evaluated and matching wildcard entries are to be
located. Split and create a bloom filter using entry and the same strategy
as for the wildcard patterns above.  These bloom filters will have had more
pairs of characters added than for a matching wildcard pattern.

Each filter contains the pattern for the unique pairs in the fragments of
the original text:

Now we can check the Bloom filters.

To find potential matching patterns we can check to see if the Bloom filter
for the pattern is contained within the bloom filter for the entry text.
If so then we know that it is likely that the pairs of characters specified
in the wild card (non-wildcard text) appear in the entry text.

At this point we can evaluate the reduced number of patterns to see which
ones actually match.

Having reduced the candidates to the matching patterns we can then use a
standard measure of similarity like the Levenshtein distance to determine
which candidates require the fewest edits to evolve into the target.  The
one with the fewest edits is the most specific and should be applied.

Now if you want to know which patterns overlap you have a similar process.

Each Bloom filter can calculate the estimated number of unique fragments
that were added to it.  If the filters are sorted by this estimation, the
ones with the highest counts may contain the ones with the lowest counts,
but a lower count can never contain a higher count.  The calculation to
perform the estimation can be skipped and the cardinality value of each
Bloom filter can be used instead.  You can then check the smaller filters
against the larger ones and find where one candidate is contained within
another.

If you want to know if they intersect the BloomFilter from
commons-collections has an estimateIntersection as well as an estimateUnion
method.  Quick tests can be made between filters to see if there is any
overlap before more complex analysis is performed.

The solution here does not make the final comparisons easier, it simply
reduces the search space to find the items that need to be compared.


On Mon, May 6, 2024 at 9:03 PM Greg Harris <greg.har...@aiven.io.invalid>
wrote:

> Hi Murali,
>
> Thanks for the KIP!
>
> I think I understand the motivation for this KIP in situations where
> there are a "cross product" of topics for two or more variables X and
> Y, and want to write ACLs for each of the variable axes.
> If you format your topics "X-Y-suffix", it's not easy to write rules
> that apply to all "Y" topics, because you need to enumerate all of the
> "X" values, and the problem persists even if you reorder the topic
> name.
>
> In my recent work on KIP-986 I found it necessary to introduce
> "namespaces" to group topics together, and I was going to replicate
> the ACL system to specify those namespaces. This change to the ACL
> system could increase the expressiveness and complexity of that
> feature, if it is ever implemented.
> One of the primitives I needed when specifying namespaces was the
> ability to tell when two namespaces overlapped (i.e. does there exist
> any topic which is present in both namespaces). This is trivial to do
> with the current PREFIX and LITERAL system, as we can find the
> maximum-length common prefix with just some length comparisons and
> equality checks.
> I considered specifying namespaces via regular expressions, and found
> that it was computationally much more difficult. Computing the
> intersection of two regexes appears to be exponential in the length of
> the regexes, leading me to avoid adding it.
>
> I understand that you're not suggesting full REGEX support, and that
> "namespaces" don't need to support MATCH, but I think MATCH may run
> into related difficulties. Any MATCH can overlap with any other MATCH
> or PREFIX if it includes a sufficient number of wildcards. For
> example:
> MATCH *-accounts-* has overlap with PREFIX nl as they can both match
> "nl-accounts-localtopic", but that isn't sensitive to the contents
> "nl", it is true for any PREFIX.
> MATCH *-accounts-* has overlap with MATCH *localtopic, as they can
> both match "nl-accounts-localtopic", but that isn't actually sensitive
> to the contents "localtopic", it's true for any MATCH which includes a
> wildcard at the beginning.
>
> This has implications for execution complexity: If we can't compute
> whether two patterns overlap, then we need to run both of them on each
> piece of input to test if they both match. Under the current
> LITERAL/PREFIX system, we can optimize execution with a trie, but that
> option wouldn't be available to us with MATCH.
>
> The current system makes users evaluate a trade-off:
> 1. Optimize the number of ACLs by organizing topics according to
> prefixes (for example, "accounts-localtopic-nl" and PREFIX "accounts",
> PREFIX "accounts-localtopic")
> 2. Use less-structured topic names, with a corresponding ACL scheme
> that has more individual rules.
> The system currently informs users of this tradeoff by making them
> write multiple ACLs, and making them think "there has got to be a
> better way!". Perhaps we can find a better way to surface this best
> practice, or better inform users about it.
>
> I understand that there are going to be situations more complex than
> your example, where multiple individual rules will always be necessary
> with only PREFIX evaluation. I think even in those situations, a
> number of efficient-to-evaluate rules is preferable to just one
> expensive-to-evaluate rule.
>
> One alternative that I thought of could be "PARAMETERIZED" ACLs which
> are like PREFIXED, but allow some parameter substitution. For example
> PARAMETERIZED "(nl|de|cz)-accounts-". I'm lifting regex syntax here,
> but this isn't actually a regex, and wouldn't allow arbitrary numbers
> of characters, or the * or + operators.
> In the background it could evaluate exactly like the 3 individual
> PREFIX rules, but be easier to evaluate on the backend, and support
> the intersection query I mentioned earlier. It could also support
> [a-zA-Z] notation in case the parameter values aren't known ahead of
> time, but have a fixed length.
>
> Thanks,
> Greg
>
> On Mon, May 6, 2024 at 11:17 AM Claude Warren <cla...@xenei.com> wrote:
> >
> > I have an idea for how to reduce the time for ACL lookups in general and
> > particularly where wildcards are involved using sequence
> > characterization techniques from bioinformatics.  But I need a set of ACL
> > patterns and associated topics to test with.
> >
> > On Fri, May 3, 2024 at 2:45 PM Haruki Okada <ocadar...@gmail.com> wrote:
> >
> > > Hi, Murali.
> > >
> > > First, could you add the KIP-1042 to the index (
> > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> > > )
> > > as well so that everyone can find it easily?
> > >
> > > I took a look at the KIP, then I have 2 questions:
> > >
> > > 1. Though the new MATCH resource pattern type may reduce the effort of
> > > adding ACLs in some cases, do you have any concrete use case you are in
> > > mind? (When prefixed ACL was introduced in KIP-290, there was a
> use-case
> > > that using it for implementing multi-tenancy)
> > >
> > > 2. As you may know, ACL lookup is in the hot-path which the
> performance is
> > > very important. (
> > >
> > >
> https://github.com/apache/kafka/blob/240243b91d69c2b65b5e456065fdcce90c121b04/core/src/main/scala/kafka/security/authorizer/AclAuthorizer.scala#L539
> > > ).
> > > Do you have ideas how do we update `matchingAcls` to support
> MATCH-type ACL
> > > without introducing performance issue?
> > >
> > >
> > > Thanks,
> > >
> > > 2024年5月3日(金) 19:51 Claude Warren, Jr <claude.war...@aiven.io.invalid>:
> > >
> > > > As I wrote in [1], the ACL evaluation algorithm needs to be specified
> > > with
> > > > respect to the specificity of the pattern so that we know exactly
> which
> > > if
> > > > *-accounts-* takes precedence over nl-accounts-* or visa versa.
> > > >
> > > > I think that we should spell out that precedence is evaluated as
> follows:
> > > >
> > > > 1. Remove patterns that do not match
> > > > 2. More specific patterns take precedence over less specific patterns
> > > > 3. for patterns of the same precedence DENY overrides ALLOW
> > > >
> > > > Determining specificity:
> > > >
> > > > Specificity is based on the Levenshtein distance between the pattern
> and
> > > > the text being evaluated. The lower the distance the more specific
> the
> > > > rule.
> > > > Using the topic name: nl-accounts-localtopic we can evaluate the
> > > > Levenshtein distance for various patterns.
> > > > nl-accounts-localtopic = 0
> > > > *-accounts-localtopic = 2
> > > > nl-accounts-local* = 5
> > > > *-accounts-local* = 7
> > > > nl-accounts-* = 10
> > > > *-accounts-* = 12
> > > >
> > > > In the special case of matching principles User matches are more
> specific
> > > > than Group matches.
> > > >
> > > > I don't know if this should be added to KIP-1042 or presented as a
> new
> > > KIP.
> > > >
> > > > Claude
> > > >
> > > > [1] https://lists.apache.org/thread/0l88tkbxq3ol9rnx0ljnmswj5y6pho1f
> > > > <
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls
> > > > >
> > > >
> > > > On Fri, May 3, 2024 at 12:18 PM Claude Warren <cla...@xenei.com>
> wrote:
> > > >
> > > > > Took me awhile to find it but the link to the KIP is
> > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls
> > > > >
> > > > > On Fri, May 3, 2024 at 10:13 AM Murali Basani <
> murali.bas...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I'd like to propose a suggestion to our resource patterns in
> Kafka
> > > > ACLs.
> > > > > >
> > > > > > Currently, when adding new ACLs in Kafka, we have two types of
> > > resource
> > > > > > patterns for topics:
> > > > > >
> > > > > >    - LITERAL
> > > > > >    - PREFIXED
> > > > > >
> > > > > > However, when it comes to listing or removing ACLs, we have a
> couple
> > > > more
> > > > > > options:
> > > > > >
> > > > > >    - MATCH
> > > > > >    - ANY (will match any pattern type)
> > > > > >
> > > > > >
> > > > > > If we can extend creating acls as well with 'MATCH' pattern
> type, it
> > > > > would
> > > > > > be very beneficial. Even though this kind of acl should be
> created
> > > with
> > > > > > utmost care, it will help organizations streamline their ACL
> > > management
> > > > > > processes.
> > > > > >
> > > > > > Example scenarios :
> > > > > >
> > > > > > Let's say we need to create ACLs for the following six topics:
> > > > > > nl-accounts-localtopic, nl-accounts-remotetopic,
> > > > de-accounts-localtopic,
> > > > > > de-accounts-remotetopic, cz-accounts-localtopic,
> > > > cz-accounts-remotetopic
> > > > > >
> > > > > > Currently, we achieve this using existing functionality by
> creating
> > > > three
> > > > > > prefixed ACLs as shown below:
> > > > > >
> > > > > > kafka-acls --bootstrap-server localhost:9092 \
> > > > > > > --add \
> > > > > > > --allow-principal
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
> > > > > > > \
> > > > > > > --producer \
> > > > > > > --topic nl-accounts- \
> > > > > > > --resource-pattern-type prefixed
> > > > > >
> > > > > >
> > > > > > kafka-acls --bootstrap-server localhost:9092 \
> > > > > > > --add \
> > > > > > > --allow-principal
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
> > > > > > > \
> > > > > > > --producer \
> > > > > > > --topic de-accounts- \
> > > > > > > --resource-pattern-type prefixed
> > > > > >
> > > > > >
> > > > > > kafka-acls --bootstrap-server localhost:9092 \
> > > > > > > --add \
> > > > > > > --allow-principal
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
> > > > > > > \
> > > > > > > --producer \
> > > > > > > --topic cz-accounts- \
> > > > > > > --resource-pattern-type prefixed
> > > > > >
> > > > > >
> > > > > > However, if we had the 'MATCH' pattern type available, we could
> > > > > accomplish
> > > > > > this with a single ACL, as illustrated here:
> > > > > >
> > > > > > kafka-acls --bootstrap-server localhost:9092 \
> > > > > > > --add \
> > > > > > > --allow-principal
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
> > > > > > > \
> > > > > > > --producer \
> > > > > > > --topic *-accounts-* \
> > > > > > > --resource-pattern-type match
> > > > > >
> > > > > >
> > > > > >
> > > > > > This pattern closely resembles PREFIXED but offers broader
> allow/deny
> > > > > > rules.
> > > > > >
> > > > > > Implementing this change could significantly reduce the effort in
> > > > several
> > > > > > acl management processes.
> > > > > >
> > > > > > I welcome your thoughts and any concerns you may have regarding
> this
> > > > > > proposal.
> > > > > >
> > > > > > Thanks,
> > > > > > Murali
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > LinkedIn: http://www.linkedin.com/in/claudewarren
> > > > >
> > > >
> > >
> > >
> > > --
> > > ========================
> > > Okada Haruki
> > > ocadar...@gmail.com
> > > ========================
> > >
> >
> >
> > --
> > LinkedIn: http://www.linkedin.com/in/claudewarren
>


-- 
LinkedIn: http://www.linkedin.com/in/claudewarren

Reply via email to