[ 
http://issues.apache.org/jira/browse/HADOOP-342?page=comments#action_12419011 ] 

Arun C Murthy commented on HADOOP-342:
--------------------------------------

Should have clarified this: the plan is to let the user specify an output 
directory in which a single text file will contain the output of the 'analysis'.

Generic Sorter:

  The generic sorter basically lets the user specify a column separator and a 
spec for priority of columns. 
  The Comparator's *compare* function (implements WritableComparable) then 
splits each sequence of data based on user specified separator and then 
compares the 2 data streams on the given priorities. 

  E.g. -sortColumnSpec 2,0,1 -separator \t
  (0-based columns)

  If there is enough interest, I can push this into mapred.lib. Appreciate any 
suggestions.

thanks,
Arun


> Design/Implement a tool to support archival and analysis of logfiles.
> ---------------------------------------------------------------------
>
>          Key: HADOOP-342
>          URL: http://issues.apache.org/jira/browse/HADOOP-342
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Arun C Murthy

>
> Requirements:
>   a) Create a tool support archival of logfiles (from diverse sources) in 
> hadoop's dfs.
>   b) The tool should also support analysis of the logfiles via grep/sort 
> primitives. The tool should allow for fairly generic pattern 'grep's and let 
> users 'sort' the matching lines (from grep) on 'columns' of their choice.
>   E.g. from hadoop logs: Look for all log-lines with 'FATAL' and sort them 
> based on timestamps (column x)  and then on column y (column x, followed by 
> column y).
> Design/Implementation:
>   a) Log Archival
>     Archival of logs from diverse sources can be accomplished using the 
> *distcp* tool (HADOOP-341).
>   
>   b) Log analysis
>     The idea is to enable users of the tool to perform analysis of logs via 
> grep/sort primitives.
>     This can be accomplished via a relatively simple Map-Reduce task where 
> the map does the *grep* for the given pattern via RegexMapper and then the 
> implicit *sort* (reducer) is used with a custom Comparator which performs the 
> user-specified comparision (columns). 
>     The sort/grep specs can be fairly powerful by letting the user of the 
> tool use java's in-built regex patterns (java.util.regex).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to