[
https://issues.apache.org/jira/browse/HADOOP-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Bieniosek updated HADOOP-1524:
--------------------------------------
Attachment: eliminate-split-idx.patch
This patch eliminates the use of split.idx. Instead, get the information
directly from the file system.
> Task Logs userlogs don't show up for a while
> ---------------------------------------------
>
> Key: HADOOP-1524
> URL: https://issues.apache.org/jira/browse/HADOOP-1524
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.13.0
> Reporter: Michael Bieniosek
> Attachments: eliminate-split-idx.patch
>
>
> When I start a task and go to the task logs, nothing shows up for a while.
> An examination of TaskLog.Writer and TaskLog.Reader reveals:
> 1. The TaskLog.Reader relies on the presence of a split.idx to identify the
> parts of the logs to display.
> 2. The TaskLog.Writer only updates the split.idx file when it moves on to the
> next log.
> As a result, updates to the log only get pushed when an entire file is done.
> Why is there a split.idx file? It seems that since files are called
> part-00000, part-00001, etc., the TaskLog.Reader can just look at all files
> and arrange them by alphabetical order. The split.idx file also contains
> file length, but this data is already stored by the filesystem.
> If nobody has objections, I'd like to write a patch to eliminate the
> split.idx file.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.