Re: Hadoop for computationally intensive tasks (no data)

Owen O'Malley Thu, 04 Sep 2008 10:18:42 -0700

On Thu, Sep 4, 2008 at 10:07 AM, Tenaali Ram <[EMAIL PROTECTED]> wrote:



> Has anyone used hadoop for merely number crunching? If yes, how should I
> define input for the job and ensure that computations are distributed to
> all
> nodes in the grid?


Yeah, it is pretty easy to do actually. If you really just have distributed
tasks, you can set the number of reduces to 0. The output of each map will
be given straight to the OutputFormat, which typically writes it into HDFS.

I wrote the Dancing Links
example<http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/dancing/package-summary.html#package_description>to
do state space exploration in map/reduce. And although a backtracking
algorithm seems like an unlikely match, it worked well. I generated the
prefixes up to a given level and wrote them one per a line. The maps each
get a set of lines and explore the entire tree downward from the prefixes
they are given. There is a single reduce that collects the answers. It was
able to solve a problem that Knuth had given up on as taking too long in 9
hours on a very small cluster.

If you look at the Powered By
Hadoop<http://wiki.apache.org/hadoop/PoweredBy>page, you'll see more
examples.

-- Owen

Re: Hadoop for computationally intensive tasks (no data)

Reply via email to