[ 
https://issues.apache.org/jira/browse/RANGER-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979918#comment-16979918
 ] 

star edited comment on RANGER-2651 at 11/22/19 7:10 AM:
--------------------------------------------------------

In hive or hbase plugin, resource 'column' is always wildcard '*'. There will 
be 100,000 wildcardEvaluators in single TrieNode. It will be O(n) complexity to 
build RangerResourceTrie or O(n*m) to query evaluators.
{code:java}
// O(n) in RangerResourceTrie
void addWildcardEvaluator(U evaluator) {
    ...
    if (!wildcardEvaluators.contains(evaluator)) {
        wildcardEvaluators.add(evaluator);
    }
}

// O(n*m) in RangerPolicyRepository
private List<RangerPolicyEvaluator> getLikelyMatchPolicyEvaluators(Map<String, 
RangerResourceTrie> resourceTrie, RangerAccessResource resource) {
    ...
    ret = new HashSet<>(smallestSet);
    for (List<RangerPolicyEvaluator> resourceEvaluators : 
resourceEvaluatorsSet) {
        if (resourceEvaluators != smallestSet) {
            // remove policies from ret that are not in resourceEvaluators
            ret.retainAll(resourceEvaluators);
        }
    }
}{code}


was (Author: starphin):
In hive or hbase plugin, resource 'column' is always wildcard '*'. There will 
be 100,000 wildcardEvaluators in single TrieNode. It will be O(n^2) complexity 
to build RangerResourceTrie or O(n*m) to query evaluators.
{code:java}
// O(n^2) in RangerResourceTrie
void addWildcardEvaluator(U evaluator) {
    ...
    if (!wildcardEvaluators.contains(evaluator)) {
        wildcardEvaluators.add(evaluator);
    }
}

// O(n*m) in RangerPolicyRepository
private List<RangerPolicyEvaluator> getLikelyMatchPolicyEvaluators(Map<String, 
RangerResourceTrie> resourceTrie, RangerAccessResource resource) {
    ...
    ret = new HashSet<>(smallestSet);
    for (List<RangerPolicyEvaluator> resourceEvaluators : 
resourceEvaluatorsSet) {
        if (resourceEvaluators != smallestSet) {
            // remove policies from ret that are not in resourceEvaluators
            ret.retainAll(resourceEvaluators);
        }
    }
}{code}

> Improve performance of building and querying  RangerResourceTrie
> ----------------------------------------------------------------
>
>                 Key: RANGER-2651
>                 URL: https://issues.apache.org/jira/browse/RANGER-2651
>             Project: Ranger
>          Issue Type: Improvement
>          Components: Ranger
>    Affects Versions: 2.0.0, 1.2.0
>            Reporter: star
>            Priority: Major
>         Attachments: building resource trie.png, getMostLikelyEvaluators.png, 
> ranger-2651.path
>
>
> When we have 100,000 policies, it takes a long time to initialize hive plugin 
> (more than 1min) and evaluate access request(more than 1s). Digging into the 
> process, we found java stack as above images. Obviously it is the 
> List.IndexOf method which makes it taking too much time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to