[ https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878824#action_12878824 ]
Dick King commented on MAPREDUCE-323: ------------------------------------- After some discussions, we've come to some decisions. 1: We'll store the completed jobs' history files in the DFS done history files tree, in the following fixed format: {{DONE/job-tracker-instance-ID/YYYY/MM/DD/987654/}} The job tracker instance ID includes both the job tracker machine name and the epoch time of the instance start. There won't be very many directories on this level. {{YYYY/MM/DD}} documents the date of completion [actually, the date that the history file is copied to DFS]. {{987654}} are the leading six digits of the job serial number, considered as a nine-digit integer. The leading zeros ARE included, so the directories can be enumerated correctly in lexicographical order. Therefore, no directory will have more than 2000 files, except in the unlikely case that there are more than 2 million jobs in one day. 2: We will modify the web application, {{jobhistory.jsp}} , in the following ways: 2a: We will decide how many jobs to filter based on the following criteria 2a1: We stop at 11 tranches of serial numbers [the tenth boundary] or a day boundary, whichever comes first [but that page delivers buttons inviting you to ask for previous days,or more tranches]. Of course, as now, we stop at 100 items if we get that many items before crossing the directory boundary, but in the new code we will remember where to continue. However, in the new codebase we won't {{ls}} the files we don't present, improving the responsiveness accordingly. 2b: We will present the job history links, newest first. 2b1: To make this coherent, we will remember where we left off for pagination To summarize how the code will work, the pagination controls will look like this: Available Jobs in History (displaying 100 jobs from 1 to 100) {{[show all] [show 1000 per page] [show entire day] [first page][last page]}} {{< golem-jt1.megacorp.com-2010-05-18 golem-jt1.megacorp.com-2010-04-18 >}} [current JT instance, previous and/or following. This line of pagination controls is omitted if there is only one.] {{< newest 2010/06/14 2010/06/13 2010/06/12 2010/06/11 2010/06/10 oldest >}} [current day, two days previous, two days succeeding -- only within the current JT instance] {{< oldest 1 2 3 4 5 next newest >}} directional words change when the search direction changes 2c: There is a notion of search direction. Currently we display oldest first, but I'm thinking of changing that because I judge "most recent first" to be the better default, especially as uptimes increase as the product becomes more mature. What do you think? Users can change direction by going to "last page" -- or "oldest/newest date" -- or "oldest/newest task tracker". When you've done that, the navigation cursors change so you're going in the right direction. > Improve the way job history files are managed > --------------------------------------------- > > Key: MAPREDUCE-323 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-323 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker > Affects Versions: 0.21.0, 0.22.0 > Reporter: Amar Kamat > Assignee: Dick King > Priority: Critical > > Today all the jobhistory files are dumped in one _job-history_ folder. This > can cause problems when there is a need to search the history folder > (job-recovery etc). It would be nice if we group all the jobs under a _user_ > folder. So all the jobs for user _amar_ will go in _history-folder/amar/_. > Jobs can be categorized using various features like _jobid, date, jobname_ > etc but using _username_ will make the search much more efficient and also > will not result into namespace explosion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.