Re: [jira] Commented: (HADOOP-342) Design/Implement a tool to support archival and analysis of logfiles.

Sanjay Dahiya Wed, 05 Jul 2006 00:39:47 -0700


On 05-Jul-06, at 11:45 AM, Eric Baldeschwieler wrote:

yupp. Perhaps the HTTP approach would support that too. Longterm, this is one of the reasons atomic appends would be cool.That would allow us to log in realtime to HDFS.

For now, I am working on Log4J extension for automatically rollingthe older logs ( based on time and size ) to a well defined dirstructure in HDFS. Same needs to be done for Map Reduce jobs whereall logs generated by the jobs go in a directory ( different for eachMR job ). This can also be used along with log analysis tools.


~Sanjay

On Jul 4, 2006, at 2:22 PM, Arkady Borkovsky wrote:
It would be very nice to be able to see (analyze) the logs of atask that is still running.
-- ab

On Jul 3, 2006, at 12:39 PM, Arun C Murthy (JIRA) wrote:
[ http://issues.apache.org/jira/browse/HADOOP-342?page=comments#action_12419011 ]
Arun C Murthy commented on HADOOP-342:
--------------------------------------
Should have clarified this: the plan is to let the user specifyan output directory in which a single text file will contain theoutput of the 'analysis'.
Generic Sorter:
The generic sorter basically lets the user specify a columnseparator and a spec for priority of columns.The Comparator's *compare* function (implementsWritableComparable) then splits each sequence of data based onuser specified separator and then compares the 2 data streams onthe given priorities.
  E.g. -sortColumnSpec 2,0,1 -separator \t
  (0-based columns)
If there is enough interest, I can push this into mapred.lib.Appreciate any suggestions.
thanks,
Arun
Design/Implement a tool to support archival and analysis oflogfiles.---------------------------------------------------------------------
         Key: HADOOP-342
         URL: http://issues.apache.org/jira/browse/HADOOP-342
     Project: Hadoop
        Type: New Feature
    Reporter: Arun C Murthy
Requirements:
a) Create a tool support archival of logfiles (from diversesources) in hadoop's dfs.b) The tool should also support analysis of the logfiles viagrep/sort primitives. The tool should allow for fairly genericpattern 'grep's and let users 'sort' the matching lines (fromgrep) on 'columns' of their choice.E.g. from hadoop logs: Look for all log-lines with 'FATAL' andsort them based on timestamps (column x) and then on column y(column x, followed by column y).
Design/Implementation:
  a) Log Archival
Archival of logs from diverse sources can be accomplishedusing the *distcp* tool (HADOOP-341).
  b) Log analysis
The idea is to enable users of the tool to perform analysisof logs via grep/sort primitives.This can be accomplished via a relatively simple Map-Reducetask where the map does the *grep* for the given pattern viaRegexMapper and then the implicit *sort* (reducer) is used witha custom Comparator which performs the user-specifiedcomparision (columns).The sort/grep specs can be fairly powerful by letting theuser of the tool use java's in-built regex patterns(java.util.regex).
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of theadministrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Re: [jira] Commented: (HADOOP-342) Design/Implement a tool to support archival and analysis of logfiles.

Reply via email to