Re: [jira] Commented: (HADOOP-342) Design/Implement a tool to support archival and analysis of logfiles.

Arkady Borkovsky Tue, 04 Jul 2006 14:22:48 -0700

It would be very nice to be able to see (analyze) the logs of a taskthat is still running.


-- ab


On Jul 3, 2006, at 12:39 PM, Arun C Murthy (JIRA) wrote:

[http://issues.apache.org/jira/browse/HADOOP-342?page=comments#action_12419011 ]
Arun C Murthy commented on HADOOP-342:
--------------------------------------
Should have clarified this: the plan is to let the user specify anoutput directory in which a single text file will contain the outputof the 'analysis'.
Generic Sorter:
The generic sorter basically lets the user specify a columnseparator and a spec for priority of columns.The Comparator's *compare* function (implements WritableComparable)then splits each sequence of data based on user specified separatorand then compares the 2 data streams on the given priorities.
  E.g. -sortColumnSpec 2,0,1 -separator \t
  (0-based columns)
If there is enough interest, I can push this into mapred.lib.Appreciate any suggestions.
thanks,
Arun
Design/Implement a tool to support archival and analysis of logfiles.
---------------------------------------------------------------------

         Key: HADOOP-342
         URL: http://issues.apache.org/jira/browse/HADOOP-342
     Project: Hadoop
        Type: New Feature
    Reporter: Arun C Murthy
Requirements:
a) Create a tool support archival of logfiles (from diversesources) in hadoop's dfs.b) The tool should also support analysis of the logfiles viagrep/sort primitives. The tool should allow for fairly genericpattern 'grep's and let users 'sort' the matching lines (from grep)on 'columns' of their choice.E.g. from hadoop logs: Look for all log-lines with 'FATAL' and sortthem based on timestamps (column x) and then on column y (column x,followed by column y).
Design/Implementation:
  a) Log Archival
Archival of logs from diverse sources can be accomplished usingthe *distcp* tool (HADOOP-341).
  b) Log analysis
The idea is to enable users of the tool to perform analysis oflogs via grep/sort primitives.This can be accomplished via a relatively simple Map-Reduce taskwhere the map does the *grep* for the given pattern via RegexMapperand then the implicit *sort* (reducer) is used with a customComparator which performs the user-specified comparision (columns).The sort/grep specs can be fairly powerful by letting the user ofthe tool use java's in-built regex patterns (java.util.regex).
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Re: [jira] Commented: (HADOOP-342) Design/Implement a tool to support archival and analysis of logfiles.

Reply via email to