[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority
[ https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated MAPREDUCE-2494: - Resolution: Fixed Status: Resolved (was: Patch Available) Just pushed this to 0.20-security branch. Thanks bobby! Make the distributed cache delete entires using LRU priority Key: MAPREDUCE-2494 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distributed-cache Affects Versions: 0.20.205.0, 0.21.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.20.205.0, 0.23.0 Attachments: MAPREDUCE-2494-20.20X-V1.patch, MAPREDUCE-2494-20.20X-V3.patch, MAPREDUCE-2494-V1.patch, MAPREDUCE-2494-V2.patch Currently the distributed cache will wait until a cache directory is above a preconfigured threshold. At which point it will delete all entries that are not currently being used. It seems like we would get far fewer cache misses if we kept some of them around, even when they are not being used. We should add in a configurable percentage for a goal of how much of the cache should remain clear when not in use, and select objects to delete based off of how recently they were used, and possibly also how large they are/how difficult is it to download them again. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority
[ https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-2494: --- Release Note: Added config option mapreduce.tasktracker.cache.local.keep.pct to the TaskTracker. It is the target percentage of the local distributed cache that should be kept in between garbage collection runs. In practice it will delete unused distributed cache entries in LRU order until the size of the cache is less than mapreduce.tasktracker.cache.local.keep.pct of the maximum cache size. This is a floating point value between 0.0 and 1.0. The default is 0.95. (was: Added config option mapreduce.tasktracker.cache.local.keep.pct to the TaskTracker. It is the minimum percentage of the local distributed cache that should be kept in between garbage collection runs. This is a floating point value between 0.0 and 1.0. The default is 0.95.) Make the distributed cache delete entires using LRU priority Key: MAPREDUCE-2494 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distributed-cache Affects Versions: 0.20.205.0, 0.21.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.20.205.0, 0.23.0 Attachments: MAPREDUCE-2494-20.20X-V1.patch, MAPREDUCE-2494-20.20X-V3.patch, MAPREDUCE-2494-V1.patch, MAPREDUCE-2494-V2.patch Currently the distributed cache will wait until a cache directory is above a preconfigured threshold. At which point it will delete all entries that are not currently being used. It seems like we would get far fewer cache misses if we kept some of them around, even when they are not being used. We should add in a configurable percentage for a goal of how much of the cache should remain clear when not in use, and select objects to delete based off of how recently they were used, and possibly also how large they are/how difficult is it to download them again. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority
[ https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-2494: --- Status: Open (was: Patch Available) Make the distributed cache delete entires using LRU priority Key: MAPREDUCE-2494 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distributed-cache Affects Versions: 0.21.0, 0.20.205.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.20.205.0, 0.23.0 Attachments: MAPREDUCE-2494-20.20X-V1.patch, MAPREDUCE-2494-V1.patch, MAPREDUCE-2494-V2.patch Currently the distributed cache will wait until a cache directory is above a preconfigured threshold. At which point it will delete all entries that are not currently being used. It seems like we would get far fewer cache misses if we kept some of them around, even when they are not being used. We should add in a configurable percentage for a goal of how much of the cache should remain clear when not in use, and select objects to delete based off of how recently they were used, and possibly also how large they are/how difficult is it to download them again. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority
[ https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-2494: --- Attachment: MAPREDUCE-2494-20.20X-V3.patch In light of MAPREDUCE-2572 I have updated the default value took keep around to be 95% instead of the first 75%. Make the distributed cache delete entires using LRU priority Key: MAPREDUCE-2494 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distributed-cache Affects Versions: 0.20.205.0, 0.21.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.20.205.0, 0.23.0 Attachments: MAPREDUCE-2494-20.20X-V1.patch, MAPREDUCE-2494-20.20X-V3.patch, MAPREDUCE-2494-V1.patch, MAPREDUCE-2494-V2.patch Currently the distributed cache will wait until a cache directory is above a preconfigured threshold. At which point it will delete all entries that are not currently being used. It seems like we would get far fewer cache misses if we kept some of them around, even when they are not being used. We should add in a configurable percentage for a goal of how much of the cache should remain clear when not in use, and select objects to delete based off of how recently they were used, and possibly also how large they are/how difficult is it to download them again. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority
[ https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-2494: --- Release Note: Added config option mapreduce.tasktracker.cache.local.keep.pct to the TaskTracker. It is the minimum percentage of the local distributed cache that should be kept in between garbage collection runs. This is a floating point value between 0.0 and 1.0. The default is 0.95. (was: Added config option mapreduce.tasktracker.cache.local.keep.pct to the TaskTracker. It is the minimum percentage of the local distributed cache that should be kept in between garbage collection runs. This is a floating point value between 0.0 and 1.0. The default is 0.75.) Status: Patch Available (was: Open) Make the distributed cache delete entires using LRU priority Key: MAPREDUCE-2494 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distributed-cache Affects Versions: 0.21.0, 0.20.205.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.20.205.0, 0.23.0 Attachments: MAPREDUCE-2494-20.20X-V1.patch, MAPREDUCE-2494-20.20X-V3.patch, MAPREDUCE-2494-V1.patch, MAPREDUCE-2494-V2.patch Currently the distributed cache will wait until a cache directory is above a preconfigured threshold. At which point it will delete all entries that are not currently being used. It seems like we would get far fewer cache misses if we kept some of them around, even when they are not being used. We should add in a configurable percentage for a goal of how much of the cache should remain clear when not in use, and select objects to delete based off of how recently they were used, and possibly also how large they are/how difficult is it to download them again. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority
[ https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-2494: --- Fix Version/s: 0.20.205.0 Make the distributed cache delete entires using LRU priority Key: MAPREDUCE-2494 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distributed-cache Affects Versions: 0.20.205.0, 0.21.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.20.205.0, 0.23.0 Attachments: MAPREDUCE-2494-20.20X-V1.patch, MAPREDUCE-2494-V1.patch, MAPREDUCE-2494-V2.patch Currently the distributed cache will wait until a cache directory is above a preconfigured threshold. At which point it will delete all entries that are not currently being used. It seems like we would get far fewer cache misses if we kept some of them around, even when they are not being used. We should add in a configurable percentage for a goal of how much of the cache should remain clear when not in use, and select objects to delete based off of how recently they were used, and possibly also how large they are/how difficult is it to download them again. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority
[ https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-2494: --- Affects Version/s: 0.20.205.0 Make the distributed cache delete entires using LRU priority Key: MAPREDUCE-2494 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distributed-cache Affects Versions: 0.20.205.0, 0.21.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.23.0 Attachments: MAPREDUCE-2494-20.20X-V1.patch, MAPREDUCE-2494-V1.patch, MAPREDUCE-2494-V2.patch Currently the distributed cache will wait until a cache directory is above a preconfigured threshold. At which point it will delete all entries that are not currently being used. It seems like we would get far fewer cache misses if we kept some of them around, even when they are not being used. We should add in a configurable percentage for a goal of how much of the cache should remain clear when not in use, and select objects to delete based off of how recently they were used, and possibly also how large they are/how difficult is it to download them again. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority
[ https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-2494: --- Attachment: MAPREDUCE-2494-20.20X-V1.patch This patch also takes into account the issues shown with MAPREDUCE-2573. This is for the security branch. [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. Make the distributed cache delete entires using LRU priority Key: MAPREDUCE-2494 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distributed-cache Affects Versions: 0.21.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.23.0 Attachments: MAPREDUCE-2494-20.20X-V1.patch, MAPREDUCE-2494-V1.patch, MAPREDUCE-2494-V2.patch Currently the distributed cache will wait until a cache directory is above a preconfigured threshold. At which point it will delete all entries that are not currently being used. It seems like we would get far fewer cache misses if we kept some of them around, even when they are not being used. We should add in a configurable percentage for a goal of how much of the cache should remain clear when not in use, and select objects to delete based off of how recently they were used, and possibly also how large they are/how difficult is it to download them again. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority
[ https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-2494: --- Status: Patch Available (was: Reopened) Make the distributed cache delete entires using LRU priority Key: MAPREDUCE-2494 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distributed-cache Affects Versions: 0.21.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.23.0 Attachments: MAPREDUCE-2494-20.20X-V1.patch, MAPREDUCE-2494-V1.patch, MAPREDUCE-2494-V2.patch Currently the distributed cache will wait until a cache directory is above a preconfigured threshold. At which point it will delete all entries that are not currently being used. It seems like we would get far fewer cache misses if we kept some of them around, even when they are not being used. We should add in a configurable percentage for a goal of how much of the cache should remain clear when not in use, and select objects to delete based off of how recently they were used, and possibly also how large they are/how difficult is it to download them again. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority
[ https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-2494: - Resolution: Fixed Fix Version/s: 0.23.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) +1 I committed this. Thanks, Robert! Make the distributed cache delete entires using LRU priority Key: MAPREDUCE-2494 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distributed-cache Affects Versions: 0.21.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.23.0 Attachments: MAPREDUCE-2494-V1.patch, MAPREDUCE-2494-V2.patch Currently the distributed cache will wait until a cache directory is above a preconfigured threshold. At which point it will delete all entries that are not currently being used. It seems like we would get far fewer cache misses if we kept some of them around, even when they are not being used. We should add in a configurable percentage for a goal of how much of the cache should remain clear when not in use, and select objects to delete based off of how recently they were used, and possibly also how large they are/how difficult is it to download them again. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority
[ https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-2494: --- Status: Open (was: Patch Available) In a different conversation with Chris he mentioned that sleeps in the tests are bad, and that if they have to be there then they should be tied together with some constant values. I am reworking the tests to deal with constant values. Make the distributed cache delete entires using LRU priority Key: MAPREDUCE-2494 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distributed-cache Affects Versions: 0.21.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: MAPREDUCE-2494-V1.patch Currently the distributed cache will wait until a cache directory is above a preconfigured threshold. At which point it will delete all entries that are not currently being used. It seems like we would get far fewer cache misses if we kept some of them around, even when they are not being used. We should add in a configurable percentage for a goal of how much of the cache should remain clear when not in use, and select objects to delete based off of how recently they were used, and possibly also how large they are/how difficult is it to download them again. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority
[ https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-2494: --- Status: Patch Available (was: Open) Make the distributed cache delete entires using LRU priority Key: MAPREDUCE-2494 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distributed-cache Affects Versions: 0.21.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: MAPREDUCE-2494-V1.patch, MAPREDUCE-2494-V2.patch Currently the distributed cache will wait until a cache directory is above a preconfigured threshold. At which point it will delete all entries that are not currently being used. It seems like we would get far fewer cache misses if we kept some of them around, even when they are not being used. We should add in a configurable percentage for a goal of how much of the cache should remain clear when not in use, and select objects to delete based off of how recently they were used, and possibly also how large they are/how difficult is it to download them again. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority
[ https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-2494: --- Attachment: MAPREDUCE-2494-V2.patch Make the distributed cache delete entires using LRU priority Key: MAPREDUCE-2494 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distributed-cache Affects Versions: 0.21.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: MAPREDUCE-2494-V1.patch, MAPREDUCE-2494-V2.patch Currently the distributed cache will wait until a cache directory is above a preconfigured threshold. At which point it will delete all entries that are not currently being used. It seems like we would get far fewer cache misses if we kept some of them around, even when they are not being used. We should add in a configurable percentage for a goal of how much of the cache should remain clear when not in use, and select objects to delete based off of how recently they were used, and possibly also how large they are/how difficult is it to download them again. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority
[ https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-2494: --- Attachment: MAPREDUCE-2494-V1.patch First patch, uses LinkedHashMap to keep track of LRU ordering of cachedArchives, so that removal of them can happen in an orderly manor. Make the distributed cache delete entires using LRU priority Key: MAPREDUCE-2494 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distributed-cache Affects Versions: 0.21.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: MAPREDUCE-2494-V1.patch Currently the distributed cache will wait until a cache directory is above a preconfigured threshold. At which point it will delete all entries that are not currently being used. It seems like we would get far fewer cache misses if we kept some of them around, even when they are not being used. We should add in a configurable percentage for a goal of how much of the cache should remain clear when not in use, and select objects to delete based off of how recently they were used, and possibly also how large they are/how difficult is it to download them again. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority
[ https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-2494: --- Release Note: Added config option mapreduce.tasktracker.cache.local.keep.pct to the TaskTracker. It is the minimum percentage of the local distributed cache that should be kept in between garbage collection runs. This is a floating point value between 0.0 and 1.0. The default is 0.75. Status: Patch Available (was: Open) Make the distributed cache delete entires using LRU priority Key: MAPREDUCE-2494 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distributed-cache Affects Versions: 0.21.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: MAPREDUCE-2494-V1.patch Currently the distributed cache will wait until a cache directory is above a preconfigured threshold. At which point it will delete all entries that are not currently being used. It seems like we would get far fewer cache misses if we kept some of them around, even when they are not being used. We should add in a configurable percentage for a goal of how much of the cache should remain clear when not in use, and select objects to delete based off of how recently they were used, and possibly also how large they are/how difficult is it to download them again. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira