[jira] Updated: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Sreekanth Ramakrishnan (JIRA) Tue, 16 Sep 2008 07:35:14 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sreekanth Ramakrishnan updated HADOOP-3930:
-------------------------------------------

    Attachment: HADOOP-3930-5.patch

Made modifications according to comments. I have also mentioned reasons why 
somethings are left as it is from previous version of the patch.

{quote}
JobTracker:
    - There's code being repeated in getAllJobs(), getAllJobs(String queue) and 
jobsToComplete. I think it should be factored out so changes to one of the 
methods (for e.g. to return a new field) need not be duplicated.
{quote} 
Code repetition for converting collection _JobInProgress_ to an array of 
_JobStatus_ has been removed. Modified getAllJobs and getAllJobs(Queue). Left 
jobsToComplete as is.

{quote}
JobQueueInfo:
    - schedulingInfo stored here is a stringified version. I think it should be 
declared a String and get/set should deal with strings. The caller should 
basically call with actualObject.toString(). This makes it similar to JobStatus.
{quote}
The reason why we are using an object and passing only String over wire is 
because we are setting scheduling information only once. Then underlying 
reference of the scheduling information is updated by the respective 
TaskScheduler's and we do a toString() while passing over wire. This way we can 
avoid to constantly update the scheduling information in queue manager. For 
example check _CapacityTaskScheduler_.

{quote}
JspUtil:
    - This is including JspHelper which is a class from the NameNode package. I 
don't think it is a good idea for a MapRed class to depend on this, however I 
understand this has always been this way. Maybe we should file a new JIRA to 
fix it.
{quote}
It is using JSPHelper from the package to generate the percentage graph. Maybe 
that method should be moved into ServletUtil class in core util package.
{quote}
CapacityTaskScheduler:
- Does not need supportsPriority as a separate field in the SchedulingInfo 
class. You can pick it up from one of the QueueSchedulingInfo objects.
{quote}
If a queue supports priority or not is stored by the JobQueueManager in 
capacity scheduler. The queue scheduling information object does not contain if 
a particular queue can support priority or not. So that is why there is a 
seperate field.
{quote}
TestJobQueueInformation:
    - I think you can use JobClient, instead of directly dealing with 
JobSubmissionProtocol and having to duplicate the methods for createRPCProxy 
etc.
{quote}

Reason why I am not using JobClient directly is because: by calling them we are 
going to call up display methods, if we call up display methods then we would 
have to parse the output of the job client and then do the test for equality. 
Moreover all the display method newly defined are private. If it is really 
required I can make them public then change test to parse the display string 
and test equality.

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>         Attachments: 3930-1.patch, HADOOP-3930-2.patch, HADOOP-3930-3.patch, 
> HADOOP-3930-4.patch, HADOOP-3930-5.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to 
> provide info to display on the JobTracker web interface and in the CLI. The 
> main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and 
> in the CLI - something as simple as a single string, or a map<string, int> 
> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the 
> existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns 
> key-value pairs which are displayed in columns on the web UI or the CLI for 
> the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns 
> key-value pairs which are displayed in columns on the web UI or the CLI for 
> the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the 
> list of jobs in a given queue, sorted by a scheduler-specific order (the 
> order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Reply via email to