[jira] Commented: (MAPREDUCE-323) Improve the way job history files are managed

Dick King (JIRA) Mon, 14 Jun 2010 18:39:44 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878824#action_12878824
 ]


Dick King commented on MAPREDUCE-323:
-------------------------------------

After some discussions, we've come to some decisions.

1: We'll store the completed jobs' history files in the DFS done history files 
tree, in the following fixed format:

{{DONE/job-tracker-instance-ID/YYYY/MM/DD/987654/}}

The job tracker instance ID includes both the job tracker machine name and the 
epoch time of the instance start.  There won't be very many directories on this 
level.

{{YYYY/MM/DD}} documents the date of completion [actually, the date that the 
history file is copied to DFS].

{{987654}} are the leading six digits of the job serial number, considered as a 
nine-digit integer.  The leading zeros ARE included, so the directories can be 
enumerated correctly in lexicographical order.  Therefore, no directory will 
have more than 2000 files, except in the unlikely case that there are more than 
2 million jobs in one day.

2: We will modify the web application, {{jobhistory.jsp}} , in the following 
ways:  

2a: We will decide how many jobs to filter based on the following criteria

2a1: We stop at 11 tranches of serial numbers [the tenth boundary] or a day 
boundary, whichever comes first [but that page delivers buttons inviting you to 
ask for previous days,or more tranches].  Of course, as now, we stop at 100 
items if we get that many items before crossing the directory boundary, but in 
the new code we will remember where to continue.  However, in the new codebase 
we won't {{ls}} the files we don't present, improving the responsiveness 
accordingly.

2b: We will present the job history links, newest first.

2b1: To make this coherent, we will remember where we left off for pagination

To summarize how the code will work, the pagination controls will look like 
this:

Available Jobs in History (displaying 100 jobs from 1 to 100) {{[show all] 
[show 1000 per page] [show entire day] [first page][last page]}}

{{< golem-jt1.megacorp.com-2010-05-18 golem-jt1.megacorp.com-2010-04-18 >}} 
[current JT instance, previous and/or following.  This line of pagination 
controls is omitted if there is only one.]

{{< newest 2010/06/14  2010/06/13  2010/06/12 2010/06/11 2010/06/10 oldest >}}  
[current day, two days previous, two days succeeding -- only within the current 
JT instance]

{{< oldest 1 2 3 4 5 next newest >}} directional words change when the search 
direction changes

2c: There is a notion of search direction.  Currently we display oldest first, 
but I'm thinking of changing that because I judge "most recent first" to be the 
better default, especially as uptimes increase as the product becomes more 
mature.  What do you think?

Users can change direction by going to "last page" -- or "oldest/newest date" 
-- or "oldest/newest task tracker".  When you've done that, the navigation 
cursors change so you're going in the right direction.


> Improve the way job history files are managed
> ---------------------------------------------
>
>                 Key: MAPREDUCE-323
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-323
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.21.0, 0.22.0
>            Reporter: Amar Kamat
>            Assignee: Dick King
>            Priority: Critical
>
> Today all the jobhistory files are dumped in one _job-history_ folder. This 
> can cause problems when there is a need to search the history folder 
> (job-recovery etc). It would be nice if we group all the jobs under a _user_ 
> folder. So all the jobs for user _amar_ will go in _history-folder/amar/_. 
> Jobs can be categorized using various features like _jobid, date, jobname_ 
> etc but using _username_ will make the search much more efficient and also 
> will not result into namespace explosion. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-323) Improve the way job history files are managed

Reply via email to