[ http://issues.apache.org/jira/browse/HADOOP-489?page=all ]
Arun C Murthy updated HADOOP-489: --------------------------------- Attachment: HADOOP-489_20061111.patch Thanks for the feedback Doug... > I can see no reason why TaskLog needs to be public. It should be > package-private. Fixed. (attached patch) > 'ant test' creates a 'userlog' directory in the connected directory that is > not removed. This should instead be created in > build/test, no? (The 'history' directory is also created, but that is not > introduced by this patch.) This seems due to (a bug?) the way the System.property 'hadoop.log.dir' is defined in the 'test-core' target... build.xml - line nos. 329-331 <sysproperty key="hadoop.log.dir" value="${hadoop.log.dir}"/> <sysproperty key="test.src.dir" value="${test.src.dir}"/> <sysproperty key="hadoop.log.dir" value="."/> This causes the userlog/history directories to be wrongly created... TaskLog relies on 'hadoop.log.dir' for the parent dir of 'userlogs' . Is there a particular reason <sysproperty key="hadoop.log.dir" value="."/> is needed? If not, I'll file a new bug? > Seperating user logs from system logs in map reduce > --------------------------------------------------- > > Key: HADOOP-489 > URL: http://issues.apache.org/jira/browse/HADOOP-489 > Project: Hadoop > Issue Type: Improvement > Components: mapred > Reporter: Mahadev konar > Assigned To: Arun C Murthy > Priority: Minor > Fix For: 0.9.0 > > Attachments: HADOOP-489_20061019.patch, HADOOP-489_20061101.patch, > HADOOP-489_20061102.patch, HADOOP-489_20061107.patch, > HADOOP-489_20061109.patch, HADOOP-489_20061111.patch > > > Currently the user logs are a part of system logs in mapreduce. Anything > logged by the user is logged into the tasktracker log files. This create two > issues- > 1) The system log files get cluttered with user output. If the user outputs a > large amount of logs, the system logs need to be cleaned up pretty often. > 2) For the user, it is difficult to get to each of the machines and look for > the logs his/her job might have generated. > I am proposing three solutions to the problem. All of them have issues with > it - > Solution 1. > Output the user logs on the user screen as part of the job submission > process. > Merits- > This will prevent users from printing large amount of logs and the user can > get runtime feedback on what is wrong with his/her job. > Issues - > This proposal will use the framework bandwidth while running jobs for the > user. The user logs will need to pass from the tasks to the tasktrackers, > from the tasktrackers to the jobtrackers and then from the jobtrackers to the > jobclient using a lot of framework bandwidth if the user is printing out too > much data. > Solution 2. > Output the user logs onto a dfs directory and then concatenate these files. > Each task can create a file for the output in the log direcotyr for a given > user and jobid. > Issues - > This will create a huge amount of small files in DFS which later can be > concatenated into a single file. Also there is this issue that who would > concatenate these files into a single file? This could be done by the > framework (jobtracker) as part of the cleanup for the jobs - might stress the > jobtracker. > > Solution 3. > Put the user logs into a seperate user log file in the log directory on each > tasktrackers. We can provide some tools to query these local log files. We > could have commands like for jobid j and for taskid t get me the user log > output. These tools could run as a seperate map reduce program with each map > grepping the user log files and a single recude aggregating these logs in to > a single dfs file. > Issues- > This does sound like more work for the user. Also, the output might not be > complete since a tasktracker might have went down after it ran the job. > Any thoughts? -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira