[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority

2011-08-01 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-2494:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Just pushed this to 0.20-security branch. Thanks bobby!

 Make the distributed cache delete entires using LRU priority
 

 Key: MAPREDUCE-2494
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache
Affects Versions: 0.20.205.0, 0.21.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Fix For: 0.20.205.0, 0.23.0

 Attachments: MAPREDUCE-2494-20.20X-V1.patch, 
 MAPREDUCE-2494-20.20X-V3.patch, MAPREDUCE-2494-V1.patch, 
 MAPREDUCE-2494-V2.patch


 Currently the distributed cache will wait until a cache directory is above a 
 preconfigured threshold.  At which point it will delete all entries that are 
 not currently being used.  It seems like we would get far fewer cache misses 
 if we kept some of them around, even when they are not being used.  We should 
 add in a configurable percentage for a goal of how much of the cache should 
 remain clear when not in use, and select objects to delete based off of how 
 recently they were used, and possibly also how large they are/how difficult 
 is it to download them again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority

2011-07-26 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-2494:
---

Release Note: Added config option 
mapreduce.tasktracker.cache.local.keep.pct to the TaskTracker.  It is the 
target percentage of the local distributed cache that should be kept in between 
garbage collection runs.  In practice it will delete unused distributed cache 
entries in LRU order until the size of the cache is less than 
mapreduce.tasktracker.cache.local.keep.pct of the maximum cache size.  This is 
a floating point value between 0.0 and 1.0.  The default is 0.95.  (was: Added 
config option mapreduce.tasktracker.cache.local.keep.pct to the TaskTracker.  
It is the minimum percentage of the local distributed cache that should be kept 
in between garbage collection runs.  This is a floating point value between 0.0 
and 1.0.  The default is 0.95.)

 Make the distributed cache delete entires using LRU priority
 

 Key: MAPREDUCE-2494
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache
Affects Versions: 0.20.205.0, 0.21.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Fix For: 0.20.205.0, 0.23.0

 Attachments: MAPREDUCE-2494-20.20X-V1.patch, 
 MAPREDUCE-2494-20.20X-V3.patch, MAPREDUCE-2494-V1.patch, 
 MAPREDUCE-2494-V2.patch


 Currently the distributed cache will wait until a cache directory is above a 
 preconfigured threshold.  At which point it will delete all entries that are 
 not currently being used.  It seems like we would get far fewer cache misses 
 if we kept some of them around, even when they are not being used.  We should 
 add in a configurable percentage for a goal of how much of the cache should 
 remain clear when not in use, and select objects to delete based off of how 
 recently they were used, and possibly also how large they are/how difficult 
 is it to download them again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority

2011-07-25 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-2494:
---

Status: Open  (was: Patch Available)

 Make the distributed cache delete entires using LRU priority
 

 Key: MAPREDUCE-2494
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache
Affects Versions: 0.21.0, 0.20.205.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Fix For: 0.20.205.0, 0.23.0

 Attachments: MAPREDUCE-2494-20.20X-V1.patch, MAPREDUCE-2494-V1.patch, 
 MAPREDUCE-2494-V2.patch


 Currently the distributed cache will wait until a cache directory is above a 
 preconfigured threshold.  At which point it will delete all entries that are 
 not currently being used.  It seems like we would get far fewer cache misses 
 if we kept some of them around, even when they are not being used.  We should 
 add in a configurable percentage for a goal of how much of the cache should 
 remain clear when not in use, and select objects to delete based off of how 
 recently they were used, and possibly also how large they are/how difficult 
 is it to download them again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority

2011-07-25 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-2494:
---

Attachment: MAPREDUCE-2494-20.20X-V3.patch

In light of MAPREDUCE-2572 I have updated the default value took keep around to 
be 95% instead of the first 75%.

 Make the distributed cache delete entires using LRU priority
 

 Key: MAPREDUCE-2494
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache
Affects Versions: 0.20.205.0, 0.21.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Fix For: 0.20.205.0, 0.23.0

 Attachments: MAPREDUCE-2494-20.20X-V1.patch, 
 MAPREDUCE-2494-20.20X-V3.patch, MAPREDUCE-2494-V1.patch, 
 MAPREDUCE-2494-V2.patch


 Currently the distributed cache will wait until a cache directory is above a 
 preconfigured threshold.  At which point it will delete all entries that are 
 not currently being used.  It seems like we would get far fewer cache misses 
 if we kept some of them around, even when they are not being used.  We should 
 add in a configurable percentage for a goal of how much of the cache should 
 remain clear when not in use, and select objects to delete based off of how 
 recently they were used, and possibly also how large they are/how difficult 
 is it to download them again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority

2011-07-25 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-2494:
---

Release Note: Added config option 
mapreduce.tasktracker.cache.local.keep.pct to the TaskTracker.  It is the 
minimum percentage of the local distributed cache that should be kept in 
between garbage collection runs.  This is a floating point value between 0.0 
and 1.0.  The default is 0.95.  (was: Added config option 
mapreduce.tasktracker.cache.local.keep.pct to the TaskTracker.  It is the 
minimum percentage of the local distributed cache that should be kept in 
between garbage collection runs.  This is a floating point value between 0.0 
and 1.0.  The default is 0.75.)
  Status: Patch Available  (was: Open)

 Make the distributed cache delete entires using LRU priority
 

 Key: MAPREDUCE-2494
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache
Affects Versions: 0.21.0, 0.20.205.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Fix For: 0.20.205.0, 0.23.0

 Attachments: MAPREDUCE-2494-20.20X-V1.patch, 
 MAPREDUCE-2494-20.20X-V3.patch, MAPREDUCE-2494-V1.patch, 
 MAPREDUCE-2494-V2.patch


 Currently the distributed cache will wait until a cache directory is above a 
 preconfigured threshold.  At which point it will delete all entries that are 
 not currently being used.  It seems like we would get far fewer cache misses 
 if we kept some of them around, even when they are not being used.  We should 
 add in a configurable percentage for a goal of how much of the cache should 
 remain clear when not in use, and select objects to delete based off of how 
 recently they were used, and possibly also how large they are/how difficult 
 is it to download them again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority

2011-07-20 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-2494:
---

Fix Version/s: 0.20.205.0

 Make the distributed cache delete entires using LRU priority
 

 Key: MAPREDUCE-2494
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache
Affects Versions: 0.20.205.0, 0.21.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Fix For: 0.20.205.0, 0.23.0

 Attachments: MAPREDUCE-2494-20.20X-V1.patch, MAPREDUCE-2494-V1.patch, 
 MAPREDUCE-2494-V2.patch


 Currently the distributed cache will wait until a cache directory is above a 
 preconfigured threshold.  At which point it will delete all entries that are 
 not currently being used.  It seems like we would get far fewer cache misses 
 if we kept some of them around, even when they are not being used.  We should 
 add in a configurable percentage for a goal of how much of the cache should 
 remain clear when not in use, and select objects to delete based off of how 
 recently they were used, and possibly also how large they are/how difficult 
 is it to download them again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority

2011-07-14 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-2494:
---

Affects Version/s: 0.20.205.0

 Make the distributed cache delete entires using LRU priority
 

 Key: MAPREDUCE-2494
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache
Affects Versions: 0.20.205.0, 0.21.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Fix For: 0.23.0

 Attachments: MAPREDUCE-2494-20.20X-V1.patch, MAPREDUCE-2494-V1.patch, 
 MAPREDUCE-2494-V2.patch


 Currently the distributed cache will wait until a cache directory is above a 
 preconfigured threshold.  At which point it will delete all entries that are 
 not currently being used.  It seems like we would get far fewer cache misses 
 if we kept some of them around, even when they are not being used.  We should 
 add in a configurable percentage for a goal of how much of the cache should 
 remain clear when not in use, and select objects to delete based off of how 
 recently they were used, and possibly also how large they are/how difficult 
 is it to download them again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority

2011-06-08 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-2494:
---

Attachment: MAPREDUCE-2494-20.20X-V1.patch

This patch also takes into account the issues shown with MAPREDUCE-2573.  This 
is for the security branch.

 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.


 Make the distributed cache delete entires using LRU priority
 

 Key: MAPREDUCE-2494
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache
Affects Versions: 0.21.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Fix For: 0.23.0

 Attachments: MAPREDUCE-2494-20.20X-V1.patch, MAPREDUCE-2494-V1.patch, 
 MAPREDUCE-2494-V2.patch


 Currently the distributed cache will wait until a cache directory is above a 
 preconfigured threshold.  At which point it will delete all entries that are 
 not currently being used.  It seems like we would get far fewer cache misses 
 if we kept some of them around, even when they are not being used.  We should 
 add in a configurable percentage for a goal of how much of the cache should 
 remain clear when not in use, and select objects to delete based off of how 
 recently they were used, and possibly also how large they are/how difficult 
 is it to download them again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority

2011-06-08 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-2494:
---

Status: Patch Available  (was: Reopened)

 Make the distributed cache delete entires using LRU priority
 

 Key: MAPREDUCE-2494
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache
Affects Versions: 0.21.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Fix For: 0.23.0

 Attachments: MAPREDUCE-2494-20.20X-V1.patch, MAPREDUCE-2494-V1.patch, 
 MAPREDUCE-2494-V2.patch


 Currently the distributed cache will wait until a cache directory is above a 
 preconfigured threshold.  At which point it will delete all entries that are 
 not currently being used.  It seems like we would get far fewer cache misses 
 if we kept some of them around, even when they are not being used.  We should 
 add in a configurable percentage for a goal of how much of the cache should 
 remain clear when not in use, and select objects to delete based off of how 
 recently they were used, and possibly also how large they are/how difficult 
 is it to download them again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority

2011-05-24 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-2494:
-

   Resolution: Fixed
Fix Version/s: 0.23.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

+1

I committed this. Thanks, Robert!

 Make the distributed cache delete entires using LRU priority
 

 Key: MAPREDUCE-2494
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache
Affects Versions: 0.21.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Fix For: 0.23.0

 Attachments: MAPREDUCE-2494-V1.patch, MAPREDUCE-2494-V2.patch


 Currently the distributed cache will wait until a cache directory is above a 
 preconfigured threshold.  At which point it will delete all entries that are 
 not currently being used.  It seems like we would get far fewer cache misses 
 if we kept some of them around, even when they are not being used.  We should 
 add in a configurable percentage for a goal of how much of the cache should 
 remain clear when not in use, and select objects to delete based off of how 
 recently they were used, and possibly also how large they are/how difficult 
 is it to download them again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority

2011-05-23 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-2494:
---

Status: Open  (was: Patch Available)

In a different conversation with Chris he mentioned that sleeps in the tests 
are bad, and that if they have to be there then they should be tied together 
with some constant values.  I am reworking the tests to deal with constant 
values.

 Make the distributed cache delete entires using LRU priority
 

 Key: MAPREDUCE-2494
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache
Affects Versions: 0.21.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Attachments: MAPREDUCE-2494-V1.patch


 Currently the distributed cache will wait until a cache directory is above a 
 preconfigured threshold.  At which point it will delete all entries that are 
 not currently being used.  It seems like we would get far fewer cache misses 
 if we kept some of them around, even when they are not being used.  We should 
 add in a configurable percentage for a goal of how much of the cache should 
 remain clear when not in use, and select objects to delete based off of how 
 recently they were used, and possibly also how large they are/how difficult 
 is it to download them again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority

2011-05-23 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-2494:
---

Status: Patch Available  (was: Open)

 Make the distributed cache delete entires using LRU priority
 

 Key: MAPREDUCE-2494
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache
Affects Versions: 0.21.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Attachments: MAPREDUCE-2494-V1.patch, MAPREDUCE-2494-V2.patch


 Currently the distributed cache will wait until a cache directory is above a 
 preconfigured threshold.  At which point it will delete all entries that are 
 not currently being used.  It seems like we would get far fewer cache misses 
 if we kept some of them around, even when they are not being used.  We should 
 add in a configurable percentage for a goal of how much of the cache should 
 remain clear when not in use, and select objects to delete based off of how 
 recently they were used, and possibly also how large they are/how difficult 
 is it to download them again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority

2011-05-23 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-2494:
---

Attachment: MAPREDUCE-2494-V2.patch

 Make the distributed cache delete entires using LRU priority
 

 Key: MAPREDUCE-2494
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache
Affects Versions: 0.21.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Attachments: MAPREDUCE-2494-V1.patch, MAPREDUCE-2494-V2.patch


 Currently the distributed cache will wait until a cache directory is above a 
 preconfigured threshold.  At which point it will delete all entries that are 
 not currently being used.  It seems like we would get far fewer cache misses 
 if we kept some of them around, even when they are not being used.  We should 
 add in a configurable percentage for a goal of how much of the cache should 
 remain clear when not in use, and select objects to delete based off of how 
 recently they were used, and possibly also how large they are/how difficult 
 is it to download them again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority

2011-05-16 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-2494:
---

Attachment: MAPREDUCE-2494-V1.patch

First patch, uses LinkedHashMap to keep track of LRU ordering of 
cachedArchives, so that removal of them can happen in an orderly manor.

 Make the distributed cache delete entires using LRU priority
 

 Key: MAPREDUCE-2494
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache
Affects Versions: 0.21.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Attachments: MAPREDUCE-2494-V1.patch


 Currently the distributed cache will wait until a cache directory is above a 
 preconfigured threshold.  At which point it will delete all entries that are 
 not currently being used.  It seems like we would get far fewer cache misses 
 if we kept some of them around, even when they are not being used.  We should 
 add in a configurable percentage for a goal of how much of the cache should 
 remain clear when not in use, and select objects to delete based off of how 
 recently they were used, and possibly also how large they are/how difficult 
 is it to download them again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority

2011-05-16 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-2494:
---

Release Note: Added config option 
mapreduce.tasktracker.cache.local.keep.pct to the TaskTracker.  It is the 
minimum percentage of the local distributed cache that should be kept in 
between garbage collection runs.  This is a floating point value between 0.0 
and 1.0.  The default is 0.75.
  Status: Patch Available  (was: Open)

 Make the distributed cache delete entires using LRU priority
 

 Key: MAPREDUCE-2494
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache
Affects Versions: 0.21.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Attachments: MAPREDUCE-2494-V1.patch


 Currently the distributed cache will wait until a cache directory is above a 
 preconfigured threshold.  At which point it will delete all entries that are 
 not currently being used.  It seems like we would get far fewer cache misses 
 if we kept some of them around, even when they are not being used.  We should 
 add in a configurable percentage for a goal of how much of the cache should 
 remain clear when not in use, and select objects to delete based off of how 
 recently they were used, and possibly also how large they are/how difficult 
 is it to download them again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira