TrackerDistributedCacheManager takes a blocking lock fo a loop that executes 10K times --------------------------------------------------------------------------------------
Key: MAPREDUCE-1909 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1909 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Dick King Assignee: Dick King In {{TrackerDistributedCachaManager.java}} , the portion where the cache is cleaned up, the lock is taken on the main hash table and then all the entries are scanned to see if they can be deleted. That's a long lockage. The table is likely to have 10K entries. I would like to reduce the longest lock duration by maintaining the set of {{CacheStatus}} es to delete incrementally. 1: Let there be a new {{HashSet}}, {{deleteSet}}, that's protected under {{synchronized(cachedArchives)}} 2: When {{refcount}} is decreased to 0, move the {{CacheStatus}} from {{cachedArchives}} to {{deleteSet}} 3: When seeking an existing {{CacheStatus}}, look in {{deleteSet}} if it isn't in {{cachedArchives}} 4: When {{refcount}} is increased from 0 to 1 in a pre-existing {{CacheStatus}} [see 3:, above] move the {{CacheStatus}} from {{deleteSet}} to {{cachedArchives}} 5: When we clean the cache, under {{synchronized(cachedArchives)}} , move {{deleteSet}} to a local variable and create a new empty {{HashSet}}. This is constant time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.