Re: basic hadoop job help

Amogh Vasekar Thu, 18 Feb 2010 11:27:05 -0800

Hi,
The hadoop meet last year has some very interesting business solutions 
discussed:
http://www.cloudera.com/company/press-center/hadoop-world-nyc/
Most of the companies in there have shared their methodology on their blogs / 
on slideshare.
One I have handy is:
http://www.slideshare.net/hadoop/practical-problem-solving-with-apache-hadoop-pig
Shows how Y! Search assist is implemented.



Amogh


On 2/19/10 12:48 AM, "C Berg" <icey...@yahoo.com> wrote:

Hi Eric,

Thanks for the advice, that is very much appreciated.  With your help I was 
able to get past the mechanical part to something a bit more substantive, which 
is, wrapping my head around doing an actual business calculation in a mapreduce 
way.  Any recommendations on some tutorials that cover some real-world examples 
other than word counting and the like?

Thanks again,

Cory

--- On Thu, 2/18/10, Eric Arenas <eare...@rocketmail.com> wrote:

> From: Eric Arenas <eare...@rocketmail.com>
> Subject: Re: basic hadoop job help
> To: common-user@hadoop.apache.org
> Date: Thursday, February 18, 2010, 10:52 AM
> Hi Cory,
>
> regarding the part that you are not sure about:
>
>
> String inputdir  = args[0];
> String outputdir= args[1];
> int numberReducers = Integer.parseInt(args[2]);
> //it is better to at least pass the numbers of reducers as
> parameters, or read from the XML job config file, if you
> want
>
> //setting the number of reducers to 1 , as you had in your
> code *might* potentially make it slower to process and
> generate the output
> //if you are trying to sell the idea of Hadoop as a new ETL
> tool, you want it to be as fast as you can
>
> ...................
>
> job2.setNumReduceTasks(1);
> FileInputFormat.setInputPaths(job, inputdir);
> FileOutputFormat.setOutputPath(job, new Path(outputdir));
>
> return job.waitForCompletion(true) ? 0 : 1;
>
>   } //end of run method
>
>
> Unless you copy/paste your code, I do not see why you need
> to set "setWorkingDirectory" in your M/R job.
>
> Give this a try and let me know,
>
> regards,
> Eric Arenas
>
>
>
> ----- Original Message ----
> From: Cory Berg <icey...@yahoo.com>
> To: common-user@hadoop.apache.org
> Sent: Thu, February 18, 2010 9:07:54 AM
> Subject: basic hadoop job help
>
> Hey all,
>
> I'm trying to get Hadoop up and running as a proof of
> concept to make an argument for moving away from a big
> RDBMS.  I'm having some challenges just getting a
> really simple demo mapreduce to run.  The examples I
> have seen on the web tend to make use of classes that are
> now deprecated in the latest hadoop (0.20.1).  It is
> not clear what the equivalent newer classes are in some
> cases.
>
> Anyway, I am stuck at this exception - here it is start to
> finish:
> ---------------
> $ ./bin/hadoop jar ./testdata/RetailTest.jar RetailTest
> testdata outputdata
> 10/02/18 09:24:55 INFO jvm.JvmMetrics: Initializing JVM
> Metrics with processName
> =JobTracker, sessionId=
> 10/02/18 09:24:55 WARN mapred.JobClient: Use
> GenericOptionsParser for parsing th
> e arguments. Applications should implement Tool for the
> same.
> 10/02/18 09:24:55 INFO input.FileInputFormat: Total input
> paths to process : 5
> 10/02/18 09:24:56 INFO input.FileInputFormat: Total input
> paths to process : 5
> Exception in thread "Thread-13"
> java.lang.IllegalStateException: Shutdown in pro
> gress
>         at
> java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:
> 39)
>         at
> java.lang.Runtime.addShutdownHook(Runtime.java:192)
>         at
> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1387)
>         at
> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
>         at
> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
>         at
> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:180)
>         at
> org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
>         at
> org.apache.hadoop.mapred.FileOutputCommitter.cleanupJob(FileOutputCom
> mitter.java:61)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:2
> 45)
> ------------
>
> Now here is the code that actually starts things up (not
> including the actual mapreduce code).  I initially
> suspected this code because I was guessing at the correct
> non-deprecated classes to use:
>
>   public int run(String[] args) throws Exception {
>         Configuration conf = new
> Configuration();
>         Job job2 = new Job(conf);
>         job2.setJobName("RetailTest");
>
> job2.setJarByClass(RetailTest.class);
>
> job2.setMapperClass(RetailMapper.class);
>
> job2.setReducerClass(RetailReducer.class);
>
> job2.setOutputKeyClass(Text.class);
>
> job2.setOutputValueClass(Text.class);
>         job2.setNumReduceTasks(1);
> // this was a guess on my part as I could not find out the
> "recommended way"
>         job2.setWorkingDirectory(new
> Path(args[0]));
>
> FileInputFormat.setInputPaths(job2, new Path(args[0]));
>
> FileOutputFormat.setOutputPath(job2, new Path(args[1]));
>         job2.submit();
>         return 0;
>       }
>
>       /**
>        * @param args
>        */
>       public static void main(String[] args)
> throws Exception {
>         int res = ToolRunner.run(new
> RetailTest(), args);
>         System.exit(res);
>       }
>
> Can someone sanity check me here?  Much appreciated.
>
> Regards,
>
> Cory
>

Re: basic hadoop job help

Reply via email to