[ https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874651#action_12874651 ]
Dick King commented on MAPREDUCE-323: ------------------------------------- The user and jobID index can both be obtained from arguments to calls to the API {{JobHistory.getJobHistoryFile(...)}} . The time stamp cannot. I'll have to store maps from jobID indices to time of first {{JobHistory.getJobHistoryFile(...)}} call to support the functionality if the cluster owner specifies time-stamp based directory structure. This map lives in the job tracker and creates a practical limit of perhaps a half million jobs, if this feature is used. Does this seem reasonable? > Improve the way job history files are managed > --------------------------------------------- > > Key: MAPREDUCE-323 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-323 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker > Affects Versions: 0.21.0, 0.22.0 > Reporter: Amar Kamat > Assignee: Dick King > Priority: Critical > > Today all the jobhistory files are dumped in one _job-history_ folder. This > can cause problems when there is a need to search the history folder > (job-recovery etc). It would be nice if we group all the jobs under a _user_ > folder. So all the jobs for user _amar_ will go in _history-folder/amar/_. > Jobs can be categorized using various features like _jobid, date, jobname_ > etc but using _username_ will make the search much more efficient and also > will not result into namespace explosion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.