[ http://issues.apache.org/jira/browse/HADOOP-342?page=comments#action_12419183 ]
eric baldeschwieler commented on HADOOP-342: -------------------------------------------- It does make sense. It might make sense to do it as a second pass though. We've got lots of logs from various sources we want this tool to work on. In many cases loading them into hadoop is a logical first step. We should make sure the loading (or HTTP scanning) is distinct from the query tools. > Design/Implement a tool to support archival and analysis of logfiles. > --------------------------------------------------------------------- > > Key: HADOOP-342 > URL: http://issues.apache.org/jira/browse/HADOOP-342 > Project: Hadoop > Type: New Feature > Reporter: Arun C Murthy > Attachments: logalyzer.patch > > Requirements: > a) Create a tool support archival of logfiles (from diverse sources) in > hadoop's dfs. > b) The tool should also support analysis of the logfiles via grep/sort > primitives. The tool should allow for fairly generic pattern 'grep's and let > users 'sort' the matching lines (from grep) on 'columns' of their choice. > E.g. from hadoop logs: Look for all log-lines with 'FATAL' and sort them > based on timestamps (column x) and then on column y (column x, followed by > column y). > Design/Implementation: > a) Log Archival > Archival of logs from diverse sources can be accomplished using the > *distcp* tool (HADOOP-341). > > b) Log analysis > The idea is to enable users of the tool to perform analysis of logs via > grep/sort primitives. > This can be accomplished via a relatively simple Map-Reduce task where > the map does the *grep* for the given pattern via RegexMapper and then the > implicit *sort* (reducer) is used with a custom Comparator which performs the > user-specified comparision (columns). > The sort/grep specs can be fairly powerful by letting the user of the > tool use java's in-built regex patterns (java.util.regex). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
