Ranjith,

MapReduce and HDFS are two different things. MapReduce uses HDFS (and
can use any other FS as well) to do some efficient work, but HDFS does
not use MapReduce.

A simple HDFS transfer is done via network directly - Yes its just a
block by block copy/write to/from the relevant DataNodes, done over
network sockets at each end.

On Tue, May 22, 2012 at 8:58 AM, Ranjith <ranjith.raghuna...@gmail.com> wrote:
> Thanks harsh. So when it connects directly to the data nodes it does not fire 
> off any mappers. So how does it get the data over? Is it just a block by 
> block copy?
>
> Thanks,
> Ranjith
>
> On May 21, 2012, at 9:22 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> Ranjith,
>>
>> Are you speaking of DistCp?
>> http://hadoop.apache.org/common/docs/current/distcp.html
>>
>> An 'fs -copyFromLocal' otherwise just runs as a single program that
>> connects to your DFS nodes and writes data from a single client
>> thread, and is not distributed on its own.
>>
>> On Tue, May 22, 2012 at 6:48 AM, Ranjith <ranjith.raghuna...@gmail.com> 
>> wrote:
>>>
>>> I have always wondered about this and and not sure as to phenomenon. When I 
>>> fire a map reduce job to copy data over in a distributed fashion I would 
>>> expect to see mappers executing the copy. What happens with a copy command 
>>> from Hadoop fs?
>>>
>>> Thanks,
>>> Ranjith
>>
>>
>>
>> --
>> Harsh J



-- 
Harsh J

Reply via email to