Hi, The hadoop meet last year has some very interesting business solutions discussed: http://www.cloudera.com/company/press-center/hadoop-world-nyc/ Most of the companies in there have shared their methodology on their blogs / on slideshare. One I have handy is: http://www.slideshare.net/hadoop/practical-problem-solving-with-apache-hadoop-pig Shows how Y! Search assist is implemented.
Amogh On 2/19/10 12:48 AM, "C Berg" <icey...@yahoo.com> wrote: Hi Eric, Thanks for the advice, that is very much appreciated. With your help I was able to get past the mechanical part to something a bit more substantive, which is, wrapping my head around doing an actual business calculation in a mapreduce way. Any recommendations on some tutorials that cover some real-world examples other than word counting and the like? Thanks again, Cory --- On Thu, 2/18/10, Eric Arenas <eare...@rocketmail.com> wrote: > From: Eric Arenas <eare...@rocketmail.com> > Subject: Re: basic hadoop job help > To: common-user@hadoop.apache.org > Date: Thursday, February 18, 2010, 10:52 AM > Hi Cory, > > regarding the part that you are not sure about: > > > String inputdir = args[0]; > String outputdir= args[1]; > int numberReducers = Integer.parseInt(args[2]); > //it is better to at least pass the numbers of reducers as > parameters, or read from the XML job config file, if you > want > > //setting the number of reducers to 1 , as you had in your > code *might* potentially make it slower to process and > generate the output > //if you are trying to sell the idea of Hadoop as a new ETL > tool, you want it to be as fast as you can > > ................... > > job2.setNumReduceTasks(1); > FileInputFormat.setInputPaths(job, inputdir); > FileOutputFormat.setOutputPath(job, new Path(outputdir)); > > return job.waitForCompletion(true) ? 0 : 1; > > } //end of run method > > > Unless you copy/paste your code, I do not see why you need > to set "setWorkingDirectory" in your M/R job. > > Give this a try and let me know, > > regards, > Eric Arenas > > > > ----- Original Message ---- > From: Cory Berg <icey...@yahoo.com> > To: common-user@hadoop.apache.org > Sent: Thu, February 18, 2010 9:07:54 AM > Subject: basic hadoop job help > > Hey all, > > I'm trying to get Hadoop up and running as a proof of > concept to make an argument for moving away from a big > RDBMS. I'm having some challenges just getting a > really simple demo mapreduce to run. The examples I > have seen on the web tend to make use of classes that are > now deprecated in the latest hadoop (0.20.1). It is > not clear what the equivalent newer classes are in some > cases. > > Anyway, I am stuck at this exception - here it is start to > finish: > --------------- > $ ./bin/hadoop jar ./testdata/RetailTest.jar RetailTest > testdata outputdata > 10/02/18 09:24:55 INFO jvm.JvmMetrics: Initializing JVM > Metrics with processName > =JobTracker, sessionId= > 10/02/18 09:24:55 WARN mapred.JobClient: Use > GenericOptionsParser for parsing th > e arguments. Applications should implement Tool for the > same. > 10/02/18 09:24:55 INFO input.FileInputFormat: Total input > paths to process : 5 > 10/02/18 09:24:56 INFO input.FileInputFormat: Total input > paths to process : 5 > Exception in thread "Thread-13" > java.lang.IllegalStateException: Shutdown in pro > gress > at > java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java: > 39) > at > java.lang.Runtime.addShutdownHook(Runtime.java:192) > at > org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1387) > at > org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191) > at > org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95) > at > org.apache.hadoop.fs.FileSystem.get(FileSystem.java:180) > at > org.apache.hadoop.fs.Path.getFileSystem(Path.java:175) > at > org.apache.hadoop.mapred.FileOutputCommitter.cleanupJob(FileOutputCom > mitter.java:61) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:2 > 45) > ------------ > > Now here is the code that actually starts things up (not > including the actual mapreduce code). I initially > suspected this code because I was guessing at the correct > non-deprecated classes to use: > > public int run(String[] args) throws Exception { > Configuration conf = new > Configuration(); > Job job2 = new Job(conf); > job2.setJobName("RetailTest"); > > job2.setJarByClass(RetailTest.class); > > job2.setMapperClass(RetailMapper.class); > > job2.setReducerClass(RetailReducer.class); > > job2.setOutputKeyClass(Text.class); > > job2.setOutputValueClass(Text.class); > job2.setNumReduceTasks(1); > // this was a guess on my part as I could not find out the > "recommended way" > job2.setWorkingDirectory(new > Path(args[0])); > > FileInputFormat.setInputPaths(job2, new Path(args[0])); > > FileOutputFormat.setOutputPath(job2, new Path(args[1])); > job2.submit(); > return 0; > } > > /** > * @param args > */ > public static void main(String[] args) > throws Exception { > int res = ToolRunner.run(new > RetailTest(), args); > System.exit(res); > } > > Can someone sanity check me here? Much appreciated. > > Regards, > > Cory >