Fateh Singh created RANGER-4761:
-----------------------------------
Summary: Revisit ColumnFamilyCache to reduce memory footprint of
hbase plugin
Key: RANGER-4761
URL: https://issues.apache.org/jira/browse/RANGER-4761
Project: Ranger
Issue Type: Improvement
Components: Ranger
Reporter: Fateh Singh
Assignee: Fateh Singh
* Map<String, Set<String>> getColumnFamilies(Map<byte[], ? extends
Collection<?>> families) was the original bottleneck for resulting in Ranger
Authz CP taking 60% of Put RPC time — it is a computationally heavy function
converting bytes to string and type-casting Collection to set of strings.
* ColumnFamilyCache was introduced to fix this issue. But this caching does
not work because the columns and column families accessed for a table are
inconsistent and results in many many *cache misses* *and thus results in
getColumnFamilies()* getting called – so additional memory requirement for
cache (grows exponentially due to large number of columns) and also double
computation (serialization of new family map to check in cache and also
getColumnFamilies is computed). E.g. First call comes for cf1:c1,c3,c4 Second
call comes for cf1:c1,c2,c5 then the second call is a cache miss because the
set of columns accessed is different from the first call. Both these entries
get added to the cache.
* Validation: Added debug statements for cache hit and miss counts. word
count in log file for Cache Miss and Cache Hit, Cache size reaches max default
size of 1024
{code:java}
[root@ccycloud-2 hbase]# grep 'Cache Miss'
hbase-cmf-HBASE-1-REGIONSERVER-ccycloud-2.fs2-7192.root.comops.site.log.out |
wc -l
10584
[root@ccycloud-2 hbase]# grep 'Cache Hit'
hbase-cmf-HBASE-1-REGIONSERVER-ccycloud-2.fs2-7192.root.comops.site.log.out |
wc -l
0
{code}
{code:java}
2024-02-09 17:12:31,093 DEBUG
org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor:
evaluateAccess: Cache Size:1024
2024-02-09 17:12:31,096 DEBUG
org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor:
evaluateAccess: Cache Size:1024
{code}
Another issue:
Key computation for cache has a bug wherein address of byte array is being used
as key in hashmap instead of value which results in cache miss always
irrespective of what the request is
{code:java}
2024-02-09 18:12:12,310 DEBUG
org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor:
evaluateAccess: Cache Miss for user[atlas] for key:
atlas_janus:[[B@1b59403b=null]{code}
The implementation needs to be revisited to reduce memory footprint
--
This message was sent by Atlassian Jira
(v8.20.10#820010)