[
https://issues.apache.org/jira/browse/HADOOP-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Owen O'Malley updated HADOOP-1553:
----------------------------------
Attachment: new-log.patch
This patch fixes the performance problems with user task logging. Before the
patch, running the word count example on a given input (Alice in Wonderland
*smile*) would take 6 seconds normally and minutes if the program printed to
stdout. After the patch, it takes 4 seconds with no stdout and 6 seconds with
printing.
This patch includes several incompatible changes:
1. The user logs are no longer stored in segments, but rather complete files.
2. All tasks are launched via bash to get input redirection.
3. The cap on user logs has been turned off by default. It is still
available, but makes the command used to launch tasks much more complicated.
4. The entire length of the user log cap is stored in memory now rather than
disk. Thus, setting the cap to a large value may cause problems.
5. The task logger has fewer configuration knobs that have been removed from
the log4j.properties.
6. The urls to access the task logs from the task tracker have changed. The
new urls only have start and end offsets, but the offsets may be either
positive from the start of the file or negative from the end of the file.
7. The jsp has been replaced by a servlet, so that the bytes don't need to be
interpreted as a string.
8. The servlet does not buffer the entire log into memory before it sent to
the user.
9. The TaskLog class is now public so that pipes can use it.
> Extensive logging of C++ application can slow down task by an order of
> magnitude
> --------------------------------------------------------------------------------
>
> Key: HADOOP-1553
> URL: https://issues.apache.org/jira/browse/HADOOP-1553
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.13.0
> Reporter: Christian Kunz
> Assignee: Owen O'Malley
> Priority: Blocker
> Fix For: 0.14.0
>
> Attachments: new-log.patch
>
>
> We observed that extensive logging (due to some configuration mistake) of a
> c++ application using the pipes interface can slow down the task by an order
> of magnitude. During that time disk usage was not high, with no abnormal
> memory usage, and basically idle CPU.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.