[
https://issues.apache.org/jira/browse/HADOOP-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
[EMAIL PROTECTED] updated HADOOP-1181:
--------------------------------------
Attachment: hadoop1181.patch
Attached is a patch that changes TaskLog$Reader so it uses URLs instead of the
file system. It also:
+ Adds a constructor that takes a userlog subdirectory URL.
+ Adds a public getInputStream method that streams over all userlog parts.
+ Makes TaskLog and TaskLog$Reader public rather than default access
+ Adds a main that takes a URL and that then prints to stdout the concatenated
logs
I'll not mark this issue as 'patch ready' until others have had a gander.
Would be great if Arun C Murthy could review since he wrote the original. In
particular, it would be nice to know if the calculation of totalLogSize in the
TaskLog$Reader#fetchAll method -- around line 384 in r523437 -- is important.
If not, then some near-duplicate code could be replaced with call to the new
getInputStream in a version2 of this patch.
> userlogs reader
> ---------------
>
> Key: HADOOP-1181
> URL: https://issues.apache.org/jira/browse/HADOOP-1181
> Project: Hadoop
> Issue Type: Improvement
> Reporter: [EMAIL PROTECTED]
> Attachments: hadoop1181.patch
>
>
> My jobs output lots of logging. I want to be able to quickly parse the logs
> across the cluster for anomalies. org.apache.hadoop.tool.Logalyzer looks
> promising at first but it does not know how to deal with the userlog format
> and it wants to first copy all logs local. Digging, there does not seem to
> currently be a reader for hadoop userlog format. TaskLog$Reader is not
> generally accessible and it too expects logs to be on the local filesystem
> (The latter is of little good if I want to run the analysis as a mapreduce
> job).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.