[
http://issues.apache.org/jira/browse/HADOOP-342?page=comments#action_12419414 ]
Arun C Murthy commented on HADOOP-342:
--------------------------------------
Summary of logalyzer usage:
Logalyzer.0.0.1
Usage:
Logalyzer [-archive -logs urlsFile>] -archiveDir <archiveDirectory> -grep
<pattern> -sort <column1,column2,...> -separator <separator> -analysis
<outputDirectory>
Usage Scenarios:
---------------------------
a) Archive only:
$ java org.apache.hadoop.tools.Logalyzer -archive -logs <urlsFile> -archiveDir
<archiveDirectory>
Fetch the logs specified in <urlsFile> (arbitrary combination of dfs & http
based logs) and archive it in <archiveDirectory> (in the dfs).
Archival of logs from diverse sources is accomplished using the *distcp* tool
(HADOOP-341).
b) Analyse only:
$ java org.apache.hadoop.tools.Logalyzer -archiveDir <archiveDirectory> -grep
<pattern> -sort <column1,column2,...> -separator <separator> -analysis
<outputDirectory>
Analyse the logs in <archiveDirectory> i.e. grep/sort-with-separator and
store the output (as a single textfile) of 'analysis' in <outputDirectory>.
This is accomplished via a Map-Reduce task where the map does the *grep* for
the given pattern via RegexMapper and then the implicit *sort* (reducer) is
used with a custom Comparator which performs the user-specified comparision
(columns).
c) Archive and analyse
$ java org.apache.hadoop.tools.Logalyzer -archive -logs <urlsFile>
-archiveDir <archiveDirectory> -grep <pattern> -sort <column1,column2,...>
-separator <separator> -analysis <outputDirectory>
Perform both a) and b) tasks.
- * - * -
Arun
> Design/Implement a tool to support archival and analysis of logfiles.
> ---------------------------------------------------------------------
>
> Key: HADOOP-342
> URL: http://issues.apache.org/jira/browse/HADOOP-342
> Project: Hadoop
> Type: New Feature
> Reporter: Arun C Murthy
> Attachments: logalyzer.patch
>
> Requirements:
> a) Create a tool support archival of logfiles (from diverse sources) in
> hadoop's dfs.
> b) The tool should also support analysis of the logfiles via grep/sort
> primitives. The tool should allow for fairly generic pattern 'grep's and let
> users 'sort' the matching lines (from grep) on 'columns' of their choice.
> E.g. from hadoop logs: Look for all log-lines with 'FATAL' and sort them
> based on timestamps (column x) and then on column y (column x, followed by
> column y).
> Design/Implementation:
> a) Log Archival
> Archival of logs from diverse sources can be accomplished using the
> *distcp* tool (HADOOP-341).
>
> b) Log analysis
> The idea is to enable users of the tool to perform analysis of logs via
> grep/sort primitives.
> This can be accomplished via a relatively simple Map-Reduce task where
> the map does the *grep* for the given pattern via RegexMapper and then the
> implicit *sort* (reducer) is used with a custom Comparator which performs the
> user-specified comparision (columns).
> The sort/grep specs can be fairly powerful by letting the user of the
> tool use java's in-built regex patterns (java.util.regex).
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira