[
http://issues.apache.org/jira/browse/HADOOP-489?page=comments#action_12438557 ]
Owen O'Malley commented on HADOOP-489:
--------------------------------------
I'd like to have:
1. A jsp on the task trackers that let's me fetch stdout/stderr from the
Tasks with urls like:
http://<tracker>/getUserLogs?job=<jobid>&task=<taskid>
getUserLogs would also have optional args for start=<#> and length=<#> that
control the start and length of the logs sent. A negative starting position
would count backwards from the end of the file.
2. An addition to the JobSubmissionProtocol:
TaskReport getTaskReport(String jobid, String taskid);
3. An addition to TaskReport to get the url for the logs for that task.
4. A job/event log available via the JobSubmissionProtocol:
List<JobEvent> getJobEvents(String jobid, long startTime);
5. JobEvents are:
a. job start
b. task start
c. task end
d. job end
e. diagnostic
6. each JobEvent has:
long getTime()
String getTaskId()
> Seperating user logs from system logs in map reduce
> ---------------------------------------------------
>
> Key: HADOOP-489
> URL: http://issues.apache.org/jira/browse/HADOOP-489
> Project: Hadoop
> Issue Type: Improvement
> Components: mapred
> Reporter: Mahadev konar
> Assigned To: Owen O'Malley
> Priority: Minor
>
> Currently the user logs are a part of system logs in mapreduce. Anything
> logged by the user is logged into the tasktracker log files. This create two
> issues-
> 1) The system log files get cluttered with user output. If the user outputs a
> large amount of logs, the system logs need to be cleaned up pretty often.
> 2) For the user, it is difficult to get to each of the machines and look for
> the logs his/her job might have generated.
> I am proposing three solutions to the problem. All of them have issues with
> it -
> Solution 1.
> Output the user logs on the user screen as part of the job submission
> process.
> Merits-
> This will prevent users from printing large amount of logs and the user can
> get runtime feedback on what is wrong with his/her job.
> Issues -
> This proposal will use the framework bandwidth while running jobs for the
> user. The user logs will need to pass from the tasks to the tasktrackers,
> from the tasktrackers to the jobtrackers and then from the jobtrackers to the
> jobclient using a lot of framework bandwidth if the user is printing out too
> much data.
> Solution 2.
> Output the user logs onto a dfs directory and then concatenate these files.
> Each task can create a file for the output in the log direcotyr for a given
> user and jobid.
> Issues -
> This will create a huge amount of small files in DFS which later can be
> concatenated into a single file. Also there is this issue that who would
> concatenate these files into a single file? This could be done by the
> framework (jobtracker) as part of the cleanup for the jobs - might stress the
> jobtracker.
>
> Solution 3.
> Put the user logs into a seperate user log file in the log directory on each
> tasktrackers. We can provide some tools to query these local log files. We
> could have commands like for jobid j and for taskid t get me the user log
> output. These tools could run as a seperate map reduce program with each map
> grepping the user log files and a single recude aggregating these logs in to
> a single dfs file.
> Issues-
> This does sound like more work for the user. Also, the output might not be
> complete since a tasktracker might have went down after it ran the job.
> Any thoughts?
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira