For copying large files, I prefer distcp.
On Sun, Apr 14, 2013 at 11:31 PM, Ted Dunning wrote:
>
>
>
> On Sun, Apr 14, 2013 at 10:33 AM, Mathias Herberts <
> mathias.herbe...@gmail.com> wrote:
>
>>
>> >
>> > This is absolutely true. Distcp dominates cp for large copies. On the
>> other hand c
On Sun, Apr 14, 2013 at 10:33 AM, Mathias Herberts <
mathias.herbe...@gmail.com> wrote:
>
> >
> > This is absolutely true. Distcp dominates cp for large copies. On the
> other hand cp dominates distcp for convenience.
> >
> > In my own experience, I love cp when copying relatively small amounts
>
> This is absolutely true. Distcp dominates cp for large copies. On the
other hand cp dominates distcp for convenience.
>
> In my own experience, I love cp when copying relatively small amounts of
data (10's of GB) where the available bandwidth of about a GB/s allows the
copy to complete in les
>>> Also how does the cp command work if the file is distributed on
>>>>>> different data nodes??
>>>>>>
>>>>>> Thanks
>>>>>> Kay
>>>>>>
>>>>>>
>>>>>> On We
re the
>>>>>> mappers do a "fully" parallel copy to the detsination).
>>>>>>
>>>>>> CP appears (correct me if im wrong) to simply invoke the FileSystem
>>>>>> and issues a copy command for every source file.
>>
etsination).
>>>>>
>>>>> CP appears (correct me if im wrong) to simply invoke the FileSystem
>>>>> and issues a copy command for every source file.
>>>>>
>>>>> I have an additional question: how is CP which is internal to a
*To:* user@hadoop.apache.org
<mailto:user@hadoop.apache.org>
*Subject:* Copy Vs DistCP
I have few questions regarding the usage of
DistCP for copying files in the same cluster.
1) Which one
rs (correct me if im wrong) to simply invoke the FileSystem
>>>>> and issues a copy command for every source file.
>>>>>
>>>>> I have an additional question: how is CP which is internal to a
>>>>> cluster optimized (if at all) ?
>>>
;> On Wed, Apr 10, 2013 at 7:28 PM, 麦树荣 wrote:
>>>>
>>>>> **
>>>>> Hi,
>>>>>
>>>>> I think it' better using Copy in the same cluster while using distCP
>>>>> between clusters, and cp command is a hadoop in
ernal to a cluster
>>>> optimized (if at all) ?
>>>>
>>>>
>>>>
>>>> On Wed, Apr 10, 2013 at 7:28 PM, 麦树荣 wrote:
>>>>> Hi,
>>>>>
>>>>> I think it' better using Copy in the same cluster while us
d (if at all) ?
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Apr 10, 2013 at 7:28 PM, 麦树荣 wrote:
>>>>>
>>>>>> **
>>>>>> Hi,
>>>>>>
>>>>>> I think it' better us
;>>> optimized (if at all) ?
>>>>
>>>>
>>>>
>>>> On Wed, Apr 10, 2013 at 7:28 PM, 麦树荣 wrote:
>>>>
>>>>> **
>>>>> Hi,
>>>>>
>>>>> I think it' better using Copy in the sam
7; better using Copy in the same cluster while using distCP
>>>> between clusters, and cp command is a hadoop internal parallel process and
>>>> will not copy files locally.
>>>>
>>>> --
>>>> 麦树荣
>>>>
>
t;>>> Hi,
>>>>
>>>> I think it' better using Copy in the same cluster while using distCP
>>>> between clusters, and cp command is a hadoop internal parallel process and
>>>> will not copy files locally.
>>>>
>>>> ---
hink it' better using Copy in the same cluster while using distCP
>>> between clusters, and cp command is a hadoop internal parallel process and
>>> will not copy files locally.
>>>
>>> --
>>> 麦树荣
>>>
>>> *From:* KayVajj
>>&
hile using distCP
>> between clusters, and cp command is a hadoop internal parallel process and
>> will not copy files locally.
>>
>> --
>> 麦树荣
>>
>> *From:* KayVajj
>> *Date:* 2013-04-11 06:20
>> *To:* user@hadoop.apa
DistCP is a full blown mapreduce job (mapper only, where the mappers do a
"fully" parallel copy to the detsination).
CP appears (correct me if im wrong) to simply invoke the FileSystem and
issues a copy command for every source file.
I have an additional question: how is CP which is internal to a
> will not copy files locally.
>
> --
> 麦树荣
>
> *From:* KayVajj
> *Date:* 2013-04-11 06:20
> *To:* user@hadoop.apache.org
> *Subject:* Copy Vs DistCP
> I have few questions regarding the usage of DistCP for copying
> files in the same cluster.
>
>
&
20
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Copy Vs DistCP
I have few questions regarding the usage of DistCP for copying files in the
same cluster.
1) Which one is better within a same cluster and what factors (like file size
etc) wouldinfluence the usage of one
I have few questions regarding the usage of DistCP for copying files in the
same cluster.
1) Which one is better within a same cluster and what factors (like file
size etc) wouldinfluence the usage of one over te other?
2) when we run a cp command like below from a client node of the cluster
(
20 matches
Mail list logo