[ https://issues.apache.org/jira/browse/RANGER-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979918#comment-16979918 ]
star edited comment on RANGER-2651 at 11/22/19 7:10 AM: -------------------------------------------------------- In hive or hbase plugin, resource 'column' is always wildcard '*'. There will be 100,000 wildcardEvaluators in single TrieNode. It will be O(n) complexity to build RangerResourceTrie or O(n*m) to query evaluators. {code:java} // O(n) in RangerResourceTrie void addWildcardEvaluator(U evaluator) { ... if (!wildcardEvaluators.contains(evaluator)) { wildcardEvaluators.add(evaluator); } } // O(n*m) in RangerPolicyRepository private List<RangerPolicyEvaluator> getLikelyMatchPolicyEvaluators(Map<String, RangerResourceTrie> resourceTrie, RangerAccessResource resource) { ... ret = new HashSet<>(smallestSet); for (List<RangerPolicyEvaluator> resourceEvaluators : resourceEvaluatorsSet) { if (resourceEvaluators != smallestSet) { // remove policies from ret that are not in resourceEvaluators ret.retainAll(resourceEvaluators); } } }{code} was (Author: starphin): In hive or hbase plugin, resource 'column' is always wildcard '*'. There will be 100,000 wildcardEvaluators in single TrieNode. It will be O(n^2) complexity to build RangerResourceTrie or O(n*m) to query evaluators. {code:java} // O(n^2) in RangerResourceTrie void addWildcardEvaluator(U evaluator) { ... if (!wildcardEvaluators.contains(evaluator)) { wildcardEvaluators.add(evaluator); } } // O(n*m) in RangerPolicyRepository private List<RangerPolicyEvaluator> getLikelyMatchPolicyEvaluators(Map<String, RangerResourceTrie> resourceTrie, RangerAccessResource resource) { ... ret = new HashSet<>(smallestSet); for (List<RangerPolicyEvaluator> resourceEvaluators : resourceEvaluatorsSet) { if (resourceEvaluators != smallestSet) { // remove policies from ret that are not in resourceEvaluators ret.retainAll(resourceEvaluators); } } }{code} > Improve performance of building and querying RangerResourceTrie > ---------------------------------------------------------------- > > Key: RANGER-2651 > URL: https://issues.apache.org/jira/browse/RANGER-2651 > Project: Ranger > Issue Type: Improvement > Components: Ranger > Affects Versions: 2.0.0, 1.2.0 > Reporter: star > Priority: Major > Attachments: building resource trie.png, getMostLikelyEvaluators.png, > ranger-2651.path > > > When we have 100,000 policies, it takes a long time to initialize hive plugin > (more than 1min) and evaluate access request(more than 1s). Digging into the > process, we found java stack as above images. Obviously it is the > List.IndexOf method which makes it taking too much time. -- This message was sent by Atlassian Jira (v8.3.4#803005)