Stephen O'Donnell created HADOOP-13263:
------------------------------------------

             Summary: Reload cached groups in background after expiry
                 Key: HADOOP-13263
                 URL: https://issues.apache.org/jira/browse/HADOOP-13263
             Project: Hadoop Common
          Issue Type: Improvement
            Reporter: Stephen O'Donnell


In HADOOP-11238 the Guava cache was introduced to allow refreshes on the 
Namenode group cache to run in the background, avoiding many slow group 
lookups. Even with this change, I have seen quite a few clusters with issues 
due to slow group lookups. The problem is most prevalent in HA clusters, where 
a slow group lookup on the hdfs user can fail to return for over 45 seconds 
causing the Failover Controller to kill it.

The way the current Guava cache implementation works is approximately:

1) On initial load, the first thread to request groups for a given user blocks 
until it returns. Any subsequent threads requesting that user block until that 
first thread populates the cache.

2) When the key expires, the first thread to hit the cache after expiry blocks. 
While it is blocked, other threads will return the old value.

I feel it is this blocking thread that still gives the Namenode issues on slow 
group lookups. If the call from the FC is the one that blocks and lookups are 
slow, if can cause the NN to be killed.

Guava has the ability to refresh expired keys completely in the background, 
where the first thread that hits an expired key schedules a background cache 
reload, but still returns the old value. Then the cache is eventually updated. 
This patch introduces this background reload feature. There are two new 
parameters:

1) hadoop.security.groups.cache.background.reload - default false to keep the 
current behaviour. Set to true to enable a small thread pool and background 
refresh for expired keys

2) hadoop.security.groups.cache.background.reload.threads - only relevant if 
the above is set to true. Controls how many threads are in the background 
refresh pool. Default is 1, which is likely to be enough.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to