[ https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874758#action_12874758 ]
Dick King commented on MAPREDUCE-323: ------------------------------------- If the cluster configuration codes any time stamps, we have to create them. We'll do this the first time we make a filename for a given job. Having done that, we'll have a map mapping job serial numbers to directory segments [which we will intern; there will be many duplicates]. Having done _that_, we will we'll keep 250K of these; we'll drop the oldest one when we add a new one that would otherwise add more than that. We'll therefore use a {{TreeMap}} . I expect about 20-40 bytes per entry; 16 bytes each tree node, and 8 or 16 for the key which would be an {{Integer}} . Recall that the directory segments are interned and would essentially vanish. This table only exists if there is a time stamp operator in the format string. > Improve the way job history files are managed > --------------------------------------------- > > Key: MAPREDUCE-323 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-323 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker > Affects Versions: 0.21.0, 0.22.0 > Reporter: Amar Kamat > Assignee: Dick King > Priority: Critical > > Today all the jobhistory files are dumped in one _job-history_ folder. This > can cause problems when there is a need to search the history folder > (job-recovery etc). It would be nice if we group all the jobs under a _user_ > folder. So all the jobs for user _amar_ will go in _history-folder/amar/_. > Jobs can be categorized using various features like _jobid, date, jobname_ > etc but using _username_ will make the search much more efficient and also > will not result into namespace explosion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.