[jira] Commented: (HADOOP-342) Design/Implement a tool to support archival and analysis of logfiles.

Arun C Murthy (JIRA) Wed, 05 Jul 2006 22:17:19 -0700

    [ 
http://issues.apache.org/jira/browse/HADOOP-342?page=comments#action_12419402 ]


Arun C Murthy commented on HADOOP-342:
--------------------------------------

I concur with the need for (optional?) HTTP based map input... I'll start on 
it. 
(I have some ideas about generalising this infrastructure, which I'm in the 
process of compiling and will send it over to a separate email).

Eric: Apologise for not clarifying this earlier: logalyzer (as-is) can be used 
in either mode independently or together i.e. it can be used either for 
archival or analysis (assuming logs are already in a given directory) or both.

Doug: Can we get logalyzer as-is into the tree right-away and meanwhile I'll 
get on to the HTTP-base map input enhancement? There is some interest for using 
it right-away... hope it isn't too much of a problem.

thanks,
Arun

> Design/Implement a tool to support archival and analysis of logfiles.
> ---------------------------------------------------------------------
>
>          Key: HADOOP-342
>          URL: http://issues.apache.org/jira/browse/HADOOP-342
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Arun C Murthy
>  Attachments: logalyzer.patch
>
> Requirements:
>   a) Create a tool support archival of logfiles (from diverse sources) in 
> hadoop's dfs.
>   b) The tool should also support analysis of the logfiles via grep/sort 
> primitives. The tool should allow for fairly generic pattern 'grep's and let 
> users 'sort' the matching lines (from grep) on 'columns' of their choice.
>   E.g. from hadoop logs: Look for all log-lines with 'FATAL' and sort them 
> based on timestamps (column x)  and then on column y (column x, followed by 
> column y).
> Design/Implementation:
>   a) Log Archival
>     Archival of logs from diverse sources can be accomplished using the 
> *distcp* tool (HADOOP-341).
>   
>   b) Log analysis
>     The idea is to enable users of the tool to perform analysis of logs via 
> grep/sort primitives.
>     This can be accomplished via a relatively simple Map-Reduce task where 
> the map does the *grep* for the given pattern via RegexMapper and then the 
> implicit *sort* (reducer) is used with a custom Comparator which performs the 
> user-specified comparision (columns). 
>     The sort/grep specs can be fairly powerful by letting the user of the 
> tool use java's in-built regex patterns (java.util.regex).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-342) Design/Implement a tool to support archival and analysis of logfiles.

Reply via email to