Hi,
I believe you need to add the partition file to distributed cache so that all 
tasks have it.
The terasort code uses this sampler, you can refer to that if needed.

Amogh


On 12/15/09 5:06 PM, "afarsek" <adji...@gmail.com> wrote:



Hi,
I'm using the InputSampler.RandomSampler to perform a partition sampling. It
should create a file called _partition.lst that should be use later on by
the partitionner class.

For some reason it doesn't work and I get a
java.io.FileNotFoundException: File _partition.lst does not exist.
Below the code: it consists of a mapper only job, taking as input a file in
a SequenceFileInputFormat that was generated by a previous job.

Thanks a lot in advance for any insights.

public class WordCountSorted {

        public static class Map extends MapReduceBase implements
Mapper<LongWritable, Text, IntWritable, Text> {
                //              private final static IntWritable one = new 
IntWritable(1);
                private Text word = new Text();

                public void map(LongWritable key, Text value, 
OutputCollector<IntWritable,
Text> output, Reporter reporter) throws IOException {
                        String line = value.toString();
                        String[] tokens = line.split("\t");
                        int nbOccurences = Integer.parseInt(tokens[1]);
                        word.set(tokens[0]);
                        output.collect(new IntWritable(nbOccurences),word );
                }
        }

        public static void main(String[] args) throws Exception {
                JobConf conf = new JobConf(WordCountSorted.class);
                conf.setJobName("wordcountsorted");

                FileInputFormat.setInputPaths(conf, new Path(args[0]));
                FileOutputFormat.setOutputPath(conf, new Path(args[1]));


                conf.setInputFormat(SequenceFileInputFormat.class);

                conf.setOutputKeyClass(IntWritable.class);
                conf.setOutputValueClass(Text.class);

                conf.setMapperClass(Map.class);
                conf.setReducerClass(IdentityReducer.class);


                conf.setNumReduceTasks(2);




                InputSampler.Sampler<IntWritable, Text> sampler =
                        new InputSampler.RandomSampler<IntWritable, Text>(0.1, 
100, 10);
                InputSampler.writePartitionFile(conf, sampler);

                conf.setPartitionerClass(TotalOrderPartitioner.class);

                JobClient.runJob(conf);
        }
}
--
View this message in context: 
http://old.nabble.com/File-_partition.lst-does-not-exist.-tp26793409p26793409.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Reply via email to