[jira] [Commented] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority

Owen O'Malley (JIRA) Fri, 13 May 2011 09:15:30 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033105#comment-13033105
 ]


Owen O'Malley commented on MAPREDUCE-2494:
------------------------------------------

I was also surprised when I walked through the code and saw that it was 
deleting all currently unused objects.

I think a straight LRU with a goal percentage of the threshold makes sense. For 
a first pass of this, I think the object's size should be ignored until we 
understand better how it interacts with the rest of the system.

So something like:
{code}
when (free space on partition < free-limit or 
      disk usage of dist cache > cache-limit) and 
     time since last purge > 10 minutes:
  purge LRU unused objects to reach goal size of cache-limit*cache-usage-goal
{code}

Does that make sense?


> Make the distributed cache delete entires using LRU priority
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2494
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>
> Currently the distributed cache will wait until a cache directory is above a 
> preconfigured threshold.  At which point it will delete all entries that are 
> not currently being used.  It seems like we would get far fewer cache misses 
> if we kept some of them around, even when they are not being used.  We should 
> add in a configurable percentage for a goal of how much of the cache should 
> remain clear when not in use, and select objects to delete based off of how 
> recently they were used, and possibly also how large they are/how difficult 
> is it to download them again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority

Reply via email to