fateh288 opened a new pull request, #307: URL: https://github.com/apache/ranger/pull/307
…of ahead of time memory allocation for family map of type Map<String, Set<String>>. Removed ColumnFailyCache. Impact: Memory and computational benefit - Cache memory saved & huge reduction in memory when large number of columns accessed. Since ColumnFamilyCache is always a miss because of non deterministic access patterns and also a bug wherein address of byte array is used as key in cache, we get computational benefit by removing ColumnFamilyCache. Memory footprint will get reduced even further when enabling column auth optimization supported by RANGER-4670 ## What changes were proposed in this pull request? Map<String, Set<String>> getColumnFamilies(Map<byte[], ? extends Collection<?>> families) function was the original bottleneck for resulting in Ranger Authz CP taking 60% of Put RPC time — it is a computationally heavy function converting bytes to string and type-casting Collection to set of strings. ColumnFamilyCache was introduced to fix this issue. But this caching does not work because the columns and column families accessed for a table are inconsistent and results in many many cache misses and thus results in getColumnFamilies() getting called – so additional memory requirement for cache (grows exponentially due to large number of columns) and also double computation (serialization of new family map to check in cache and also getColumnFamilies is computed). E.g. First call comes for cf1:c1,c3,c4 Second call comes for cf1:c1,c2,c5 then the second call is a cache miss because the set of columns accessed is different from the first call. Both these entries get added to the cache. Validation: Added debug statements for cache hit and miss counts. word count in log file for Cache Miss and Cache Hit, Cache size reaches max default size of 1024 ``` root@ccycloud-2 hbase]# grep 'Cache Miss' hbase-cmf-HBASE-1-REGIONSERVER-ccycloud-2.fs2-7192.root.comops.site.log.out | wc -l 10584 [root@ccycloud-2 hbase]# grep 'Cache Hit' hbase-cmf-HBASE-1-REGIONSERVER-ccycloud-2.fs2-7192.root.comops.site.log.out | wc -l 0 2024-02-09 17:12:31,093 DEBUG org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor: evaluateAccess: Cache Size:1024 2024-02-09 17:12:31,096 DEBUG org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor: evaluateAccess: Cache Size:1024 ``` In the patch, make lazy memory allocation for family map instead of ahead of time memory allocation for family map of type Map<String, Set<String>>. Removed ColumnFamilyCache from implementation since it was always a miss. ## How was this patch tested? Unit test cases pass for hbase plugin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
