[ https://issues.apache.org/jira/browse/MAPREDUCE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868749#comment-15868749 ]
Weiwei Yang edited comment on MAPREDUCE-6847 at 2/16/17 3:32 AM: ----------------------------------------------------------------- Hello [~jlowe] Thanks for your comments, I appreciate that. What I wanted to resolve here is to let JHS be able to remove some out-of-dated jobs from cache. At present, JHS cache works like (for example it allows to cache 5 jobs or equivalent number of tasks) # User clicks job1, job2 ... job5, JHS caches 5 jobs in memory # JHS maintains all jobs in cache # A long time passed # Job1, 2 .. 5 are pretty out-of-dated, user clicks job6, JHS cache evicts a job but the cache still contains 5 jobs, 1 new and the other 4 old This has no problem if the job size is small, but if jobs are large, e.g 100k tasks each, 5 jobs in cache will consume approximately more than 1.2 * 5 = 6G memory, is this really necessary? The patch was trying to simply expire some jobs in cache so let it caches more recent ones instead of those that have rare user access (small chance). Does that make sense to you? was (Author: cheersyang): Hello [~jlowe] Thanks for your comments, I appreciate that. What I wanted to resolve here is to let JHS be able to remove some out-of-dated jobs from cache. At present, JHS cache works like (for example it allows to cache 5 jobs or equivalent number of tasks) # User clicks job1, job2 ... job5, JHS caches 5 jobs in memory # JHS maintains all jobs in cache # A long time passed # Job1, 2 .. 5 are pretty out-of-dated, user clicks job6, JHS cache evicts a job but the cache still contains 5 jobs, 1 new and the other 4 old This has no problem if the job size is small, but if jobs are large, e.g 100k tasks each, 5 jobs in cache will consume approximately more than 1.2 * 5 = 6G memory, is this really necessary? The patch was trying to simply expire some jobs in cache so let it cache recent ones that would have user access (small chance). Does that make sense to you? > Job history server should release jobs from cache after a fixed duration > ------------------------------------------------------------------------ > > Key: MAPREDUCE-6847 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6847 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver > Reporter: Weiwei Yang > Assignee: Weiwei Yang > Attachments: MAPREDUCE-6847.01.patch > > > We found history server is consuming a lot of memory when there are large > jobs (with more than 100k of tasks in a single job). Currently JHS cache only > evicts entries with size, it's better to add the time expiration as well to > reduce heap usage if job has no one accessing for sometime. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org