So to be "distributed" in a sense, you would want to do your computation on the 
disconnected parts of data in the map phase I would guess?

Terrence A. Pietrondi
http://del.icio.us/tepietrondi


--- On Wed, 10/1/08, Arun C Murthy <[EMAIL PROTECTED]> wrote:

> From: Arun C Murthy <[EMAIL PROTECTED]>
> Subject: Re: architecture diagram
> To: core-user@hadoop.apache.org
> Date: Wednesday, October 1, 2008, 2:16 PM
> On Oct 1, 2008, at 10:17 AM, Terrence A. Pietrondi wrote:
> 
> > I am trying to plan out my map-reduce implementation
> and I have some  
> > questions of where computation should be split in
> order to take  
> > advantage of the distributed nodes.
> >
> > Looking at the architecture diagram
> (http://hadoop.apache.org/core/images/architecture.gif 
> > ), are the map boxes the major computation areas or is
> the reduce  
> > the major computation area?
> >
> 
> Usually the maps perform the 'embarrassingly
> parallel' computational  
> steps where-in each map works independently on a
> 'split' on your input  
> and the reduces perform the 'aggregate'
> computations.
> 
>  From http://hadoop.apache.org/core/ :
> 
> Hadoop implements MapReduce, using the Hadoop Distributed
> File System  
> (HDFS). MapReduce divides applications into many small
> blocks of work.  
> HDFS creates multiple replicas of data blocks for
> reliability, placing  
> them on compute nodes around the cluster. MapReduce can
> then process  
> the data where it is located.
> 
> The Hadoop Map-Reduce framework is quite good at scheduling
> your  
> 'maps' on the actual data-nodes where the
> input-blocks are present,  
> leading to i/o efficiencies...
> 
> Arun
> 
> > Thanks.
> >
> > Terrence A. Pietrondi
> >
> >
> >


      

Reply via email to