I use IntelliJ, though Eclipse works too. I don't have any Hadoop-specific plug-ins; both IDEs are just set up as vanilla Java programming environments.
Chapter 5 of *Hadoop: the Definitive Guide<http://www.librarything.com/work/8488103> *has a good overview of testing methodology. It's what I follow. I always run code in local single-JVM mode so that I can step through it in the IDE's debugger. Only when I've got that working do I try to deploy to a cluster. For debugging scale-up bugs that only happen on the cluster I rely on Log4j logging, though I add code that allows the log level to be set via a Hadoop configuration parameter so that I can run different jobs at different log levels on the same cluster. I also do my best to factor my logic apart from the Hadoop boilerplate so that I can unit test the former with Juint4. More details on my testing methodology here: http://cornercases.wordpress.com/2011/04/08/unit-testing-mapreduce-with-the-adapter-pattern/ . In my organization we have a single cluster that is used for research,testing, and production work. This works fine. Just make sure to set up HDFS permissions so that you don't accidentally delete work. We also have a separate one-node cluster that is used for controlled measurement of wall-clock performance of Hadoop jobs. We write directly to the Map/Reduce interface instead of using higher level tools like Pig or Cascade. Those higher level tools look like they would be helpful, but no one has had the time to learn how to use them yet. All the code reuse techniques I employ are described in the link above. For each job I end up directly subclassing Hadoop's Mapper and Reducer classes. I find those to already be at the right level of generality and haven't had cause to add any further encapsulation. On Thu, Apr 7, 2011 at 12:39 AM, Guy Doulberg <guy.doulb...@conduit.com>wrote: > Hey, > > I have been developing Map/Red jars for a while now, and I am still not > comfortable with the developing environment I gathered for myself (and the > team) > > I am curious how other Hadoop developers out-there, are developing their > jobs... > > What IDE you are using, > What plugins to the IDE you are using > How do you test your code, which Unit test libraries your using, how do you > run your automatic tests after you have finished the development? > Do you have test/qa/staging environments beside the dev and the production? > How do you keep it similar to the production > Code reuse - how do you build components that can be used in other jobs, do > you build generic map or reduce class? > > I can tell you that I have no answer to the questions above, > > I hope that this post is not too general, but I think the discussion here > could be helpful for newbie and experienced developers all together > > Thanks Guy >