[
https://issues.apache.org/jira/browse/HADOOP-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666061#action_12666061
]
Vivek Ratan commented on HADOOP-4413:
-------------------------------------
@Mac:
bq. I think either way we will want to be able to correlate the job life cycle
events with the scheduler events.
Absolutely. That's why I kept the Job IDs out of the methods of
CapacitySchedulerInstrumentation. If we can't synchronize the scheduler's
events with the jobs' events, we can look at modify these methods. We're
logging, or collecting, a lot of information. The key is to see how to parse
this information to present a unified life cycle view - for a job, for a queue,
etc.
@hemanth:
bq. The other two classes are using it, and so they need it. We could add it
when required, no ?
ChukwaTTInstru doesn't use the TaskTracker member variable, though
TaskTrackerMetricsInst does. The Scheduler member variable seems useful (for
future classes) and logical to be in CapacitySchedulerInstrumentation. Plus, we
don't want too many changes to CapacitySchedulerInstrumentation - it acts like
an interface.
bq. I think some of the information is not captured by the jobtracker
instrumentation at a job level - memory based blocking for instance, also our
initialization logic is different.
We capture memory based blocking through
CapacitySchedulerInstrumentation.blockOnHighMemJob. Does that need a job
parameter? Maybe not. Maybe we only care to know about how many times we
blocked. If we also want to know on which job we blocked, we can add a job
parameter.
Do we want to capture events in job initialization? I'm not sure. On one hand,
job initialization is an internal thing - it's not an external facing event. I
see CapacitySchedulerInstrumentation as capturing the external events of the
scheduler, events that are familiar to a use or to Ops. If a job's running, I
know it's initialized. If I want to detect how well my initialization routine
is running, I'd use log files for that. However, if we feel the need to capture
and track job initialization events, we can add them. I just didn't see a need.
But if you do, it would be great if you can suggest what methods to add to
capture initialization of jobs.
bq. Essentially, if we could work a little bit on what kind of information we
want captured, it might help us better
I think we have, at least to get started. There's a listing of what we want to
capture at the beginning of this Jira. I think we're covering all of that. Do
you feel we're missing something? Again, I sense that what all we want to
capture will become clearer once we run this thing and start analyzing life
cycle events. I've tried to capture whatever I thought would be important. But
feel free to suggest other events.
> Capacity Scheduler to provide a scheduler history log to record actions taken
> and why
> -------------------------------------------------------------------------------------
>
> Key: HADOOP-4413
> URL: https://issues.apache.org/jira/browse/HADOOP-4413
> Project: Hadoop Core
> Issue Type: Improvement
> Components: contrib/capacity-sched
> Reporter: Mac Yang
> Attachments: 4413.1.patch
>
>
> It would be very useful if the capacity scheduler can provide a log that
> record the decisions made and actions taken by the scheduler.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.