On 05-Jul-06, at 11:45 AM, Eric Baldeschwieler wrote:
yupp. Perhaps the HTTP approach would support that too. Long
term, this is one of the reasons atomic appends would be cool.
That would allow us to log in realtime to HDFS.
For now, I am working on Log4J extension for automatically rolling
the older logs ( based on time and size ) to a well defined dir
structure in HDFS. Same needs to be done for Map Reduce jobs where
all logs generated by the jobs go in a directory ( different for each
MR job ). This can also be used along with log analysis tools.
~Sanjay
On Jul 4, 2006, at 2:22 PM, Arkady Borkovsky wrote:
It would be very nice to be able to see (analyze) the logs of a
task that is still running.
-- ab
On Jul 3, 2006, at 12:39 PM, Arun C Murthy (JIRA) wrote:
[ http://issues.apache.org/jira/browse/HADOOP-342?
page=comments#action_12419011 ]
Arun C Murthy commented on HADOOP-342:
--------------------------------------
Should have clarified this: the plan is to let the user specify
an output directory in which a single text file will contain the
output of the 'analysis'.
Generic Sorter:
The generic sorter basically lets the user specify a column
separator and a spec for priority of columns.
The Comparator's *compare* function (implements
WritableComparable) then splits each sequence of data based on
user specified separator and then compares the 2 data streams on
the given priorities.
E.g. -sortColumnSpec 2,0,1 -separator \t
(0-based columns)
If there is enough interest, I can push this into mapred.lib.
Appreciate any suggestions.
thanks,
Arun
Design/Implement a tool to support archival and analysis of
logfiles.
-------------------------------------------------------------------
--
Key: HADOOP-342
URL: http://issues.apache.org/jira/browse/HADOOP-342
Project: Hadoop
Type: New Feature
Reporter: Arun C Murthy
Requirements:
a) Create a tool support archival of logfiles (from diverse
sources) in hadoop's dfs.
b) The tool should also support analysis of the logfiles via
grep/sort primitives. The tool should allow for fairly generic
pattern 'grep's and let users 'sort' the matching lines (from
grep) on 'columns' of their choice.
E.g. from hadoop logs: Look for all log-lines with 'FATAL' and
sort them based on timestamps (column x) and then on column y
(column x, followed by column y).
Design/Implementation:
a) Log Archival
Archival of logs from diverse sources can be accomplished
using the *distcp* tool (HADOOP-341).
b) Log analysis
The idea is to enable users of the tool to perform analysis
of logs via grep/sort primitives.
This can be accomplished via a relatively simple Map-Reduce
task where the map does the *grep* for the given pattern via
RegexMapper and then the implicit *sort* (reducer) is used with
a custom Comparator which performs the user-specified
comparision (columns).
The sort/grep specs can be fairly powerful by letting the
user of the tool use java's in-built regex patterns
(java.util.regex).
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira