[ 
https://issues.apache.org/jira/browse/RANGER-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fateh Singh updated RANGER-4761:
--------------------------------
    Description: 
* Map<String, Set> getColumnFamilies(Map<byte[], ? extends Collection<?>> 
families) becomes a bottleneck in multiget and multiput workloads wherein 
hundred/thousands of columns may be accessed together in a single request as it 
is a computationally heavy function converting bytes to string and type-casting 
Collection to set of strings.

The implementation needs to be revisited to reduce memory footprint

make lazy memory allocation for family map instead of ahead of time memory 
allocation for family map of type Map<String, Set>.

  was:
* Map<String, Set<String>> getColumnFamilies(Map<byte[], ? extends 
Collection<?>> families) was the original bottleneck for resulting in Ranger 
Authz CP taking 60% of Put RPC time — it is a computationally heavy function 
converting bytes to string and type-casting Collection to set of strings.
 * ColumnFamilyCache was introduced to fix this issue. But this caching does 
not work because the columns and column families accessed for a table are 
inconsistent and results in many many *cache misses* *and thus results in 
getColumnFamilies()* getting called – so additional memory requirement for 
cache (grows exponentially due to large number of columns) and also double 
computation (serialization of new family map to check in cache and also 
getColumnFamilies is computed). E.g. First call comes for cf1:c1,c3,c4 Second 
call comes for cf1:c1,c2,c5 then the second call is a cache miss because the 
set of columns accessed is different from the first call. Both these entries 
get added to the cache. 
 * Validation: Added debug statements for cache hit and miss counts.  word 
count in log file for Cache Miss and Cache Hit, Cache size reaches max default 
size of 1024

{code:java}
[root@ccycloud-2 hbase]# grep 'Cache Miss' 
hbase-cmf-HBASE-1-REGIONSERVER-ccycloud-2.fs2-7192.root.comops.site.log.out | 
wc -l
10584
[root@ccycloud-2 hbase]# grep 'Cache Hit' 
hbase-cmf-HBASE-1-REGIONSERVER-ccycloud-2.fs2-7192.root.comops.site.log.out | 
wc -l
0
{code}
{code:java}
2024-02-09 17:12:31,093 DEBUG 
org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor: 
evaluateAccess: Cache Size:1024
2024-02-09 17:12:31,096 DEBUG 
org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor: 
evaluateAccess: Cache Size:1024
{code}
Another issue:
Key computation for cache has a bug wherein address of byte array is being used 
as key in hashmap instead of value which results in cache miss always 
irrespective of what the request is
{code:java}
2024-02-09 18:12:12,310 DEBUG 
org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor: 
evaluateAccess: Cache Miss for user[atlas] for key: 
atlas_janus:[[B@1b59403b=null]{code}
 

The implementation needs to be revisited to reduce memory footprint


> Reduce memory footprint of hbase plugin
> ---------------------------------------
>
>                 Key: RANGER-4761
>                 URL: https://issues.apache.org/jira/browse/RANGER-4761
>             Project: Ranger
>          Issue Type: Improvement
>          Components: Ranger
>            Reporter: Fateh Singh
>            Assignee: Fateh Singh
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> * Map<String, Set> getColumnFamilies(Map<byte[], ? extends Collection<?>> 
> families) becomes a bottleneck in multiget and multiput workloads wherein 
> hundred/thousands of columns may be accessed together in a single request as 
> it is a computationally heavy function converting bytes to string and 
> type-casting Collection to set of strings.
> The implementation needs to be revisited to reduce memory footprint
> make lazy memory allocation for family map instead of ahead of time memory 
> allocation for family map of type Map<String, Set>.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to