On Jul 22, 2009, at 8:22 AM, Rares Vernica wrote:
Hello,
I wonder how did the Yahoo! developers generate the Task Timeline
figures in their "Hadoop Sorts a Petabyte..." blog post:
The script is at:
http://people.apache.org/~omalley/tera-2009/job_history_summary.py
The input data is the job logs from the run:
http://people.apache.org/~omalley/tera-2009/1t/job_200904102259_0008_arunc_TeraSort.log.gz
is the 1tb run's log file. Uncompress it and feed it as standard input
to the script. It will generate:
http://people.apache.org/~omalley/tera-2009/1t/summary.lst
and the bottom of that fed into your favorite spreadsheet will
generate the pretty graphs.
-- Owen