On Jul 22, 2009, at 8:22 AM, Rares Vernica wrote:

Hello,

I wonder how did the Yahoo! developers generate the Task Timeline
figures in their "Hadoop Sorts a Petabyte..." blog post:

The script is at:

http://people.apache.org/~omalley/tera-2009/job_history_summary.py

The input data is the job logs from the run:

http://people.apache.org/~omalley/tera-2009/1t/job_200904102259_0008_arunc_TeraSort.log.gz

is the 1tb run's log file. Uncompress it and feed it as standard input to the script. It will generate:

http://people.apache.org/~omalley/tera-2009/1t/summary.lst

and the bottom of that fed into your favorite spreadsheet will generate the pretty graphs.

-- Owen

Reply via email to