Hi, I believe you need to add the partition file to distributed cache so that all tasks have it. The terasort code uses this sampler, you can refer to that if needed.
Amogh On 12/15/09 5:06 PM, "afarsek" <adji...@gmail.com> wrote: Hi, I'm using the InputSampler.RandomSampler to perform a partition sampling. It should create a file called _partition.lst that should be use later on by the partitionner class. For some reason it doesn't work and I get a java.io.FileNotFoundException: File _partition.lst does not exist. Below the code: it consists of a mapper only job, taking as input a file in a SequenceFileInputFormat that was generated by a previous job. Thanks a lot in advance for any insights. public class WordCountSorted { public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, IntWritable, Text> { // private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector<IntWritable, Text> output, Reporter reporter) throws IOException { String line = value.toString(); String[] tokens = line.split("\t"); int nbOccurences = Integer.parseInt(tokens[1]); word.set(tokens[0]); output.collect(new IntWritable(nbOccurences),word ); } } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCountSorted.class); conf.setJobName("wordcountsorted"); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); conf.setInputFormat(SequenceFileInputFormat.class); conf.setOutputKeyClass(IntWritable.class); conf.setOutputValueClass(Text.class); conf.setMapperClass(Map.class); conf.setReducerClass(IdentityReducer.class); conf.setNumReduceTasks(2); InputSampler.Sampler<IntWritable, Text> sampler = new InputSampler.RandomSampler<IntWritable, Text>(0.1, 100, 10); InputSampler.writePartitionFile(conf, sampler); conf.setPartitionerClass(TotalOrderPartitioner.class); JobClient.runJob(conf); } } -- View this message in context: http://old.nabble.com/File-_partition.lst-does-not-exist.-tp26793409p26793409.html Sent from the Hadoop core-user mailing list archive at Nabble.com.