[BCC-ing "general" - again.]

On Tuesday 17 August 2010 07:36 AM, Scott Whitecross wrote:
Thanks for the answers Doug and Arun.   I'm assuming the job-history files
mentioned are in ./hadoop-0.20/logs/history/done/.  The files look like they
were serialized by a class in Hadoop?  (If I can read the files back into
the appropriate class, and then dump them out into a custom format, that'd
be great.)

Rumen (src/tools/org/apache/hadoop/tools/rumen/) parses Job History files
and creates JSON files that can be either be loaded independently, or via
the API provided by Rumen itself. As an added benefit, it abstracts away
the differences between the 0.20.xx format and the Avro-based format used
in trunk.

There is not much documentation on Rumen right now, but MAPREDUCE-1918
(https://issues.apache.org/jira/browse/MAPREDUCE-1918) attempts to fix
that.

HTH,
Ranjit

On Thu, Aug 12, 2010 at 12:52 AM, Arun C Murthy<a...@yahoo-inc.com>  wrote:

Moving to mapreduce-user@, bcc gene...@.

There isn't a direct way. One possible option is just use the per-job
job-history file which is on HDFS (See
http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Job+Submission+and+Monitoringfor
 info on job-history).

Hope that helps.

Arun


On Aug 11, 2010, at 8:54 AM, Scott Whitecross wrote:

  Hi -

What's the best way to list and query information on Hadoop job histories?
For example, I'd like to see the job names from the past week against a
Hadoop cluster I'm using.   I don't see an API call or a way through the
command line to pull the information.  Is the best way writing a quick
script to process the job history files?

Thanks.
Scott




Reply via email to