David Since the data generation and sorting is different hadoop jobs, you can generate the data once and sort the same data as many times as as you want.
I don't think Teragen is deterministic.( or rather , the keys are random but the text is deterministic if I remember correctly ) Raj >________________________________ > From: David Erickson <halcyon1...@gmail.com> >To: common-user@hadoop.apache.org >Sent: Saturday, April 14, 2012 1:53 PM >Subject: Is TeraGen's generated data deterministic? > >Hi we are doing some benchmarking of some of our infrastructure and >are using TeraGen/TeraSort to do the benchmarking. I am wondering if >the data generated by TeraGen is deterministic, in that if I repeat >the same experiment multiple times with the same configuration options >if it will continue to generate and sort the exact same data? And if >not, is there an easy mod to make this happen? > >Thanks! >David > > >