[ 
https://issues.apache.org/jira/browse/RANGER-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fateh Singh updated RANGER-4761:
--------------------------------
    Summary: Reduce memory footprint of hbase plugin  (was: Revisit 
ColumnFamilyCache to reduce memory footprint of hbase plugin)

> Reduce memory footprint of hbase plugin
> ---------------------------------------
>
>                 Key: RANGER-4761
>                 URL: https://issues.apache.org/jira/browse/RANGER-4761
>             Project: Ranger
>          Issue Type: Improvement
>          Components: Ranger
>            Reporter: Fateh Singh
>            Assignee: Fateh Singh
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> * Map<String, Set<String>> getColumnFamilies(Map<byte[], ? extends 
> Collection<?>> families) was the original bottleneck for resulting in Ranger 
> Authz CP taking 60% of Put RPC time — it is a computationally heavy function 
> converting bytes to string and type-casting Collection to set of strings.
>  * ColumnFamilyCache was introduced to fix this issue. But this caching does 
> not work because the columns and column families accessed for a table are 
> inconsistent and results in many many *cache misses* *and thus results in 
> getColumnFamilies()* getting called – so additional memory requirement for 
> cache (grows exponentially due to large number of columns) and also double 
> computation (serialization of new family map to check in cache and also 
> getColumnFamilies is computed). E.g. First call comes for cf1:c1,c3,c4 Second 
> call comes for cf1:c1,c2,c5 then the second call is a cache miss because the 
> set of columns accessed is different from the first call. Both these entries 
> get added to the cache. 
>  * Validation: Added debug statements for cache hit and miss counts.  word 
> count in log file for Cache Miss and Cache Hit, Cache size reaches max 
> default size of 1024
> {code:java}
> [root@ccycloud-2 hbase]# grep 'Cache Miss' 
> hbase-cmf-HBASE-1-REGIONSERVER-ccycloud-2.fs2-7192.root.comops.site.log.out | 
> wc -l
> 10584
> [root@ccycloud-2 hbase]# grep 'Cache Hit' 
> hbase-cmf-HBASE-1-REGIONSERVER-ccycloud-2.fs2-7192.root.comops.site.log.out | 
> wc -l
> 0
> {code}
> {code:java}
> 2024-02-09 17:12:31,093 DEBUG 
> org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor: 
> evaluateAccess: Cache Size:1024
> 2024-02-09 17:12:31,096 DEBUG 
> org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor: 
> evaluateAccess: Cache Size:1024
> {code}
> Another issue:
> Key computation for cache has a bug wherein address of byte array is being 
> used as key in hashmap instead of value which results in cache miss always 
> irrespective of what the request is
> {code:java}
> 2024-02-09 18:12:12,310 DEBUG 
> org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor: 
> evaluateAccess: Cache Miss for user[atlas] for key: 
> atlas_janus:[[B@1b59403b=null]{code}
>  
> The implementation needs to be revisited to reduce memory footprint



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to