Hi,
I would still like to use the new API. So what I am trying to do now is
to not use the command line interface to submit a job, but do it from
Java code. How do I do this? This is what I do at the moment:
* Clean start up of Hadoop (formatted file system and all)
* Using the standard WordCount Mapper and Reducer I wrote this main method:
public static void main(String[] args) throws IOException,
InterruptedException, ClassNotFoundException {
Configuration configuration = new Configuration();
InetSocketAddress socket = new InetSocketAddress("localhost",
9001);
Cluster cluster = new Cluster(socket, configuration);
FileSystem fs = cluster.getFileSystem();
Path homeDirectory = fs.getHomeDirectory();
Path input = new Path(homeDirectory, INPUT);
Path output = new Path(homeDirectory, OUTPUT);
fs.delete(output, true);
fs.copyFromLocalFile(new
Path("resources/test/wordcount/data/ipsum.txt"), new Path(input,
"input.txt"));
Job job = Job.getInstance(cluster);
//1 job.addArchiveToClassPath(new Path("release/test.jar"));
//2 job.addFileToClassPath(new
Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount.class"));
// job.addFileToClassPath(new
Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount$Map.class"));
// job.addFileToClassPath(new
Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount$Reduce.class"));
job.setJarByClass(WordCount.class);
job.setMapperClass(Map.class);
job.setCombinerClass(Reduce.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, input);
FileOutputFormat.setOutputPath(job, output);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
* I tried to run this code as is in Eclipse.
* Obviously, I guess, Hadoop needed the WordClass classes to work so I
got this error:
java.lang.RuntimeException: java.lang.ClassNotFoundException:
de.fstyle.hadoop.tutorial.wordcount.WordCount$Map
* Putting everything into a jar and adding the following line did not do
any good:
job.addArchiveToClassPath(new Path("release/test.jar"));
* Adding each class separately throws the same exception:
job.addFileToClassPath(new
Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount.class"));
job.addFileToClassPath(new
Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount$Map.class"));
job.addFileToClassPath(new
Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount$Reduce.class"));
* Using
job.setJar("release/test.jar");
Will get me
java.io.FileNotFoundException: File
/tmp/hadoop-martin/mapred/staging/martin/.staging/job_201009221802_0033/job.jar
does not exist.
So how would I set this up/use oi correctly? Sorry, I did not find any
tutorial or examples anywhere.
Martin
On 22.09.2010 18:29, Tom White wrote:
Note that JobClient, along with the rest of the "old" API in
org.apache.hadoop.mapred, has been undeprecated in Hadoop 0.21.0 so
you can continue to use it without warnings.
Tom
On Wed, Sep 22, 2010 at 2:43 AM, Amareshwari Sri Ramadasu
<amar...@yahoo-inc.com> wrote:
In 0.21, JobClient methods are available in org.apache.hadoop.mapreduce.Job
and org.apache.hadoop.mapreduce.Cluster classes.
On 9/22/10 3:07 PM, "Martin Becker"<_martinbec...@web.de> wrote:
Hello,
I am using the Hadoop MapReduce version 0.20.2 and soon 0.21.
I wanted to use the JobClient class to circumvent the use of the command
line interface.
I am noticed that JobClient still uses the deprecated JobConf class for
jib submissions.
Are there any alternatives to JobClient not using the deprecated JobConf
class?
Thanks in advance,
Martin