[ https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902719#action_12902719 ]
Dick King commented on MAPREDUCE-323: ------------------------------------- It is not used and I'll remove it. Kind of a minor point ... saves about four executed lines on an error path that normally never happens ... but I'll make the change. Because when we create a new subdirectory within the new runnable within moveToDone(JobID), the thread waits until enough time passes that that subdirectory will never have any entries added again, and then it writes the index. That ties up a tread, so we need an additional one to move the mail. Indeed it isn't. I left it in the parameter chain because future code changes may use it. In particular we might place a ceiling on how many jobs there could ever come to be in one subdirectory, and that would take a JobID to enforce. Actually, I have it backwards. We're indexing on every call, whether it needs it or not, which is bad. I'll fix this. Yeah, when I abstracted out buildIndex I didn't delete enough code from the inline. I make the 5 minute checkpoints because there is a small exposure to some history logs not getting indexed after a job tracker crash. This measure reduces the exposure. I made the busy wait loop 30 seconds, rather than one second, and on every pass to reduce the load and to make this code run only as often as it needs to. However, I therefore increased the thread pool size to THREE: 1 to be in the loop waiting for the hour to end, 1 to be obsolete because the hour already ended during its half minute but it doesn't realize it yet, and 1 to copy a history file. That's three. If we're in the usual case where there is only one instance busy-waiting, then two instances might flow into the copying code. This is harmless but not useful [since the whole copying code is run with the lock taken]. > Improve the way job history files are managed > --------------------------------------------- > > Key: MAPREDUCE-323 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-323 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker > Affects Versions: 0.21.0, 0.22.0 > Reporter: Amar Kamat > Assignee: Dick King > Priority: Critical > Attachments: MR323--2010-08-20--1533.patch > > > Today all the jobhistory files are dumped in one _job-history_ folder. This > can cause problems when there is a need to search the history folder > (job-recovery etc). It would be nice if we group all the jobs under a _user_ > folder. So all the jobs for user _amar_ will go in _history-folder/amar/_. > Jobs can be categorized using various features like _jobid, date, jobname_ > etc but using _username_ will make the search much more efficient and also > will not result into namespace explosion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.