[
https://issues.apache.org/jira/browse/RANGER-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979918#comment-16979918
]
star edited comment on RANGER-2651 at 11/22/19 7:10 AM:
--------------------------------------------------------
In hive or hbase plugin, resource 'column' is always wildcard '*'. There will
be 100,000 wildcardEvaluators in single TrieNode. It will be O(n) complexity to
build RangerResourceTrie or O(n*m) to query evaluators.
{code:java}
// O(n) in RangerResourceTrie
void addWildcardEvaluator(U evaluator) {
...
if (!wildcardEvaluators.contains(evaluator)) {
wildcardEvaluators.add(evaluator);
}
}
// O(n*m) in RangerPolicyRepository
private List<RangerPolicyEvaluator> getLikelyMatchPolicyEvaluators(Map<String,
RangerResourceTrie> resourceTrie, RangerAccessResource resource) {
...
ret = new HashSet<>(smallestSet);
for (List<RangerPolicyEvaluator> resourceEvaluators :
resourceEvaluatorsSet) {
if (resourceEvaluators != smallestSet) {
// remove policies from ret that are not in resourceEvaluators
ret.retainAll(resourceEvaluators);
}
}
}{code}
was (Author: starphin):
In hive or hbase plugin, resource 'column' is always wildcard '*'. There will
be 100,000 wildcardEvaluators in single TrieNode. It will be O(n^2) complexity
to build RangerResourceTrie or O(n*m) to query evaluators.
{code:java}
// O(n^2) in RangerResourceTrie
void addWildcardEvaluator(U evaluator) {
...
if (!wildcardEvaluators.contains(evaluator)) {
wildcardEvaluators.add(evaluator);
}
}
// O(n*m) in RangerPolicyRepository
private List<RangerPolicyEvaluator> getLikelyMatchPolicyEvaluators(Map<String,
RangerResourceTrie> resourceTrie, RangerAccessResource resource) {
...
ret = new HashSet<>(smallestSet);
for (List<RangerPolicyEvaluator> resourceEvaluators :
resourceEvaluatorsSet) {
if (resourceEvaluators != smallestSet) {
// remove policies from ret that are not in resourceEvaluators
ret.retainAll(resourceEvaluators);
}
}
}{code}
> Improve performance of building and querying RangerResourceTrie
> ----------------------------------------------------------------
>
> Key: RANGER-2651
> URL: https://issues.apache.org/jira/browse/RANGER-2651
> Project: Ranger
> Issue Type: Improvement
> Components: Ranger
> Affects Versions: 2.0.0, 1.2.0
> Reporter: star
> Priority: Major
> Attachments: building resource trie.png, getMostLikelyEvaluators.png,
> ranger-2651.path
>
>
> When we have 100,000 policies, it takes a long time to initialize hive plugin
> (more than 1min) and evaluate access request(more than 1s). Digging into the
> process, we found java stack as above images. Obviously it is the
> List.IndexOf method which makes it taking too much time.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)