Re: Developing, Testing, Distributing

W.P. McNeill Fri, 08 Apr 2011 10:05:09 -0700

I use IntelliJ, though Eclipse works too.  I don't have any Hadoop-specific
plug-ins; both IDEs are just set up as vanilla Java programming
environments.

Chapter 5 of *Hadoop: the Definitive
Guide<http://www.librarything.com/work/8488103>
 *has a good overview of testing methodology. It's what I follow. I always
run code in local single-JVM mode so that I can step through it in the IDE's
debugger. Only when I've got that working do I try to deploy to a cluster.
For debugging scale-up bugs that only happen on the cluster I rely on Log4j
logging, though I add code that allows the log level to be set via a Hadoop
configuration parameter so that I can run different jobs at different log
levels on the same cluster. I also do my best to factor my logic apart from
the Hadoop boilerplate so that I can unit test the former with Juint4. More
details on my testing methodology here:
http://cornercases.wordpress.com/2011/04/08/unit-testing-mapreduce-with-the-adapter-pattern/
.

In my organization we have a single cluster that is used for
research,testing, and production work. This works fine.  Just make sure to
set up HDFS permissions so that you don't accidentally delete work. We also
have a separate one-node cluster that is used for controlled measurement of
wall-clock performance of Hadoop jobs.

We write directly to the Map/Reduce interface instead of using higher level
tools like Pig or Cascade. Those higher level tools look like they would be
helpful, but no one has had the time to learn how to use them yet. All the
code reuse techniques I employ are described in the link above. For each job
I end up directly subclassing Hadoop's Mapper and Reducer classes. I find
those to already be at the right level of generality and haven't had cause
to add any further encapsulation.

On Thu, Apr 7, 2011 at 12:39 AM, Guy Doulberg <guy.doulb...@conduit.com>wrote:

> Hey,
>
> I have been developing Map/Red jars for a while now, and I am still not
> comfortable with the developing environment I gathered for myself (and the
> team)
>
> I am curious how other Hadoop developers out-there, are developing their
> jobs...
>
> What IDE you are using,
> What plugins to the IDE you are using
> How do you test your code, which Unit test libraries your using, how do you
> run your automatic tests after you have finished the development?
> Do you have test/qa/staging environments beside the dev and the production?
> How do you keep it similar to the production
> Code reuse - how do you build components that can be used in other jobs, do
> you build generic map or reduce class?
>
> I can tell you that I have no answer to the questions above,
>
> I hope that this post is not too general, but I think the discussion here
> could be helpful for newbie and experienced developers all together
>
> Thanks Guy
>

Re: Developing, Testing, Distributing

Reply via email to