On 05-Jul-06, at 11:45 AM, Eric Baldeschwieler wrote:

yupp. Perhaps the HTTP approach would support that too. Long term, this is one of the reasons atomic appends would be cool. That would allow us to log in realtime to HDFS.

For now, I am working on Log4J extension for automatically rolling the older logs ( based on time and size ) to a well defined dir structure in HDFS. Same needs to be done for Map Reduce jobs where all logs generated by the jobs go in a directory ( different for each MR job ). This can also be used along with log analysis tools.

~Sanjay


On Jul 4, 2006, at 2:22 PM, Arkady Borkovsky wrote:

It would be very nice to be able to see (analyze) the logs of a task that is still running.

-- ab

On Jul 3, 2006, at 12:39 PM, Arun C Murthy (JIRA) wrote:

[ http://issues.apache.org/jira/browse/HADOOP-342? page=comments#action_12419011 ]

Arun C Murthy commented on HADOOP-342:
--------------------------------------

Should have clarified this: the plan is to let the user specify an output directory in which a single text file will contain the output of the 'analysis'.

Generic Sorter:

The generic sorter basically lets the user specify a column separator and a spec for priority of columns. The Comparator's *compare* function (implements WritableComparable) then splits each sequence of data based on user specified separator and then compares the 2 data streams on the given priorities.

  E.g. -sortColumnSpec 2,0,1 -separator \t
  (0-based columns)

If there is enough interest, I can push this into mapred.lib. Appreciate any suggestions.

thanks,
Arun


Design/Implement a tool to support archival and analysis of logfiles. ------------------------------------------------------------------- --

         Key: HADOOP-342
         URL: http://issues.apache.org/jira/browse/HADOOP-342
     Project: Hadoop
        Type: New Feature

    Reporter: Arun C Murthy


Requirements:
a) Create a tool support archival of logfiles (from diverse sources) in hadoop's dfs. b) The tool should also support analysis of the logfiles via grep/sort primitives. The tool should allow for fairly generic pattern 'grep's and let users 'sort' the matching lines (from grep) on 'columns' of their choice. E.g. from hadoop logs: Look for all log-lines with 'FATAL' and sort them based on timestamps (column x) and then on column y (column x, followed by column y).
Design/Implementation:
  a) Log Archival
Archival of logs from diverse sources can be accomplished using the *distcp* tool (HADOOP-341).

  b) Log analysis
The idea is to enable users of the tool to perform analysis of logs via grep/sort primitives. This can be accomplished via a relatively simple Map-Reduce task where the map does the *grep* for the given pattern via RegexMapper and then the implicit *sort* (reducer) is used with a custom Comparator which performs the user-specified comparision (columns). The sort/grep specs can be fairly powerful by letting the user of the tool use java's in-built regex patterns (java.util.regex).

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira





Reply via email to