That is a good idea, but doesn't work in my case. What I want to do is to test how my partitioner could divide the workload properly. It is supposed to go against skew, but not to generate skew. I still need a skewed data source. Any ideas?
Thanks, -Gang ----- 原始邮件 ---- 发件人: Aaron Kimball <aa...@cloudera.com> 收件人: common-user@hadoop.apache.org 发送日期: 2010/3/3 (周三) 3:50:59 下午 主 题: Re: dataset Look at implementing your own Partitioner implementation to control which records are sent to which reduce shards. - Aaron On Wed, Mar 3, 2010 at 12:15 PM, Gang Luo <lgpub...@yahoo.com.cn> wrote: > Hi all, > I want to generate some datasets with data skew to test my mapreduce jobs. > I am using TPC-DS but it seems I cannot control the data skew level. There > is a suite from Microsoft that could generate skewed datasets based on > TPC-D, but only workable in windows. I haven't succeed make it compilable in > linux yet. Please tell me how can I get some skewed dataset. > > Thanks. > -Gang > > > > >