[ 
https://issues.apache.org/jira/browse/RANGER-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979918#comment-16979918
 ] 

star edited comment on RANGER-2651 at 11/26/19 7:32 AM:
--------------------------------------------------------

In hive or hbase plugin, resource 'column' is always wildcard '*'. There will 
be 100,000 wildcardEvaluators in single TrieNode. It will be O(n^2)complexity 
to build RangerResourceTrie or O(n*m) to query evaluators.

 
{code:java}
// O(n^2) in RangerResourceTrie
void addWildcardEvaluator(U evaluator) {
    ...
    if (!wildcardEvaluators.contains(evaluator)) {
        wildcardEvaluators.add(evaluator);
    }
}

// O(n*m) in RangerPolicyRepository
private List<RangerPolicyEvaluator> getLikelyMatchPolicyEvaluators(Map<String, 
RangerResourceTrie> resourceTrie, RangerAccessResource resource) {
    ...
    ret = new HashSet<>(smallestSet);
    for (List<RangerPolicyEvaluator> resourceEvaluators : 
resourceEvaluatorsSet) {
        if (resourceEvaluators != smallestSet) {
            // remove policies from ret that are not in resourceEvaluators
            ret.retainAll(resourceEvaluators);
        }
    }
}{code}
 Accually, we have 1 million policies for hive. It takes more than 2 
hours(Still not done.). After the improvement, only 6 seconds is needed.


was (Author: starphin):
In hive or hbase plugin, resource 'column' is always wildcard '*'. There will 
be 100,000 wildcardEvaluators in single TrieNode. It will be O(n) complexity to 
build RangerResourceTrie or O(n*m) to query evaluators.

 
{code:java}
// O(n) in RangerResourceTrie
void addWildcardEvaluator(U evaluator) {
    ...
    if (!wildcardEvaluators.contains(evaluator)) {
        wildcardEvaluators.add(evaluator);
    }
}

// O(n*m) in RangerPolicyRepository
private List<RangerPolicyEvaluator> getLikelyMatchPolicyEvaluators(Map<String, 
RangerResourceTrie> resourceTrie, RangerAccessResource resource) {
    ...
    ret = new HashSet<>(smallestSet);
    for (List<RangerPolicyEvaluator> resourceEvaluators : 
resourceEvaluatorsSet) {
        if (resourceEvaluators != smallestSet) {
            // remove policies from ret that are not in resourceEvaluators
            ret.retainAll(resourceEvaluators);
        }
    }
}{code}
 Accually, we have 1 million policies for hive. It takes more than 2 
hours(Still not done.). After the improvement, only 6 seconds is needed.

> Improve performance of building and querying  RangerResourceTrie
> ----------------------------------------------------------------
>
>                 Key: RANGER-2651
>                 URL: https://issues.apache.org/jira/browse/RANGER-2651
>             Project: Ranger
>          Issue Type: Improvement
>          Components: Ranger
>    Affects Versions: 2.0.0, 1.2.0
>            Reporter: star
>            Assignee: Abhay Kulkarni
>            Priority: Major
>         Attachments: RANGER-2651.patch, building resource trie.png, 
> getMostLikelyEvaluators.png, ranger-2651.path
>
>
> When we have 100,000 policies, it takes a long time to initialize hive plugin 
> (more than 1min) and evaluate access request(more than 1s). Digging into the 
> process, we found java stack as above images. Obviously it is the 
> List.IndexOf method which makes it taking too much time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to