Re: Copy Vs DistCP

2013-04-15 Thread Amal G Jose
For copying large files, I prefer distcp. On Sun, Apr 14, 2013 at 11:31 PM, Ted Dunning wrote: > > > > On Sun, Apr 14, 2013 at 10:33 AM, Mathias Herberts < > mathias.herbe...@gmail.com> wrote: > >> >> > >> > This is absolutely true. Distcp dominates cp for large copies. On the >> other hand c

Re: Copy Vs DistCP

2013-04-14 Thread Ted Dunning
On Sun, Apr 14, 2013 at 10:33 AM, Mathias Herberts < mathias.herbe...@gmail.com> wrote: > > > > > This is absolutely true. Distcp dominates cp for large copies. On the > other hand cp dominates distcp for convenience. > > > > In my own experience, I love cp when copying relatively small amounts

Re: Copy Vs DistCP

2013-04-14 Thread Mathias Herberts
> > This is absolutely true. Distcp dominates cp for large copies. On the other hand cp dominates distcp for convenience. > > In my own experience, I love cp when copying relatively small amounts of data (10's of GB) where the available bandwidth of about a GB/s allows the copy to complete in les

Re: Copy Vs DistCP

2013-04-14 Thread Ted Dunning
>>> Also how does the cp command work if the file is distributed on >>>>>> different data nodes?? >>>>>> >>>>>> Thanks >>>>>> Kay >>>>>> >>>>>> >>>>>> On We

Re: Copy Vs DistCP

2013-04-14 Thread Mathias Herberts
re the >>>>>> mappers do a "fully" parallel copy to the detsination). >>>>>> >>>>>> CP appears (correct me if im wrong) to simply invoke the FileSystem >>>>>> and issues a copy command for every source file. >>

Re: Copy Vs DistCP

2013-04-13 Thread Ted Dunning
etsination). >>>>> >>>>> CP appears (correct me if im wrong) to simply invoke the FileSystem >>>>> and issues a copy command for every source file. >>>>> >>>>> I have an additional question: how is CP which is internal to a

Re: Copy Vs DistCP

2013-04-12 Thread Lance Norskog
*To:* user@hadoop.apache.org <mailto:user@hadoop.apache.org> *Subject:* Copy Vs DistCP I have few questions regarding the usage of DistCP for copying files in the same cluster. 1) Which one

Re: Copy Vs DistCP

2013-04-11 Thread Azuryy Yu
rs (correct me if im wrong) to simply invoke the FileSystem >>>>> and issues a copy command for every source file. >>>>> >>>>> I have an additional question: how is CP which is internal to a >>>>> cluster optimized (if at all) ? >>>

Re: Copy Vs DistCP

2013-04-11 Thread KayVajj
;> On Wed, Apr 10, 2013 at 7:28 PM, 麦树荣 wrote: >>>> >>>>> ** >>>>> Hi, >>>>> >>>>> I think it' better using Copy in the same cluster while using distCP >>>>> between clusters, and cp command is a hadoop in

Re: Copy Vs DistCP

2013-04-11 Thread Jay Vyas
ernal to a cluster >>>> optimized (if at all) ? >>>> >>>> >>>> >>>> On Wed, Apr 10, 2013 at 7:28 PM, 麦树荣 wrote: >>>>> Hi, >>>>> >>>>> I think it' better using Copy in the same cluster while us

Re: Copy Vs DistCP

2013-04-11 Thread Azuryy Yu
d (if at all) ? >>>>> >>>>> >>>>> >>>>> On Wed, Apr 10, 2013 at 7:28 PM, 麦树荣 wrote: >>>>> >>>>>> ** >>>>>> Hi, >>>>>> >>>>>> I think it' better us

Re: Copy Vs DistCP

2013-04-11 Thread Hemanth Yamijala
;>>> optimized (if at all) ? >>>> >>>> >>>> >>>> On Wed, Apr 10, 2013 at 7:28 PM, 麦树荣 wrote: >>>> >>>>> ** >>>>> Hi, >>>>> >>>>> I think it' better using Copy in the sam

Re: Copy Vs DistCP

2013-04-10 Thread KayVajj
7; better using Copy in the same cluster while using distCP >>>> between clusters, and cp command is a hadoop internal parallel process and >>>> will not copy files locally. >>>> >>>> -- >>>> 麦树荣 >>>> >

Re: Copy Vs DistCP

2013-04-10 Thread Alexander Pivovarov
t;>>> Hi, >>>> >>>> I think it' better using Copy in the same cluster while using distCP >>>> between clusters, and cp command is a hadoop internal parallel process and >>>> will not copy files locally. >>>> >>>> ---

Re: Copy Vs DistCP

2013-04-10 Thread Azuryy Yu
hink it' better using Copy in the same cluster while using distCP >>> between clusters, and cp command is a hadoop internal parallel process and >>> will not copy files locally. >>> >>> -- >>> 麦树荣 >>> >>> *From:* KayVajj >>&

Re: Copy Vs DistCP

2013-04-10 Thread KayVajj
hile using distCP >> between clusters, and cp command is a hadoop internal parallel process and >> will not copy files locally. >> >> -- >> 麦树荣 >> >> *From:* KayVajj >> *Date:* 2013-04-11 06:20 >> *To:* user@hadoop.apa

Re: Copy Vs DistCP

2013-04-10 Thread Jay Vyas
DistCP is a full blown mapreduce job (mapper only, where the mappers do a "fully" parallel copy to the detsination). CP appears (correct me if im wrong) to simply invoke the FileSystem and issues a copy command for every source file. I have an additional question: how is CP which is internal to a

Re: Copy Vs DistCP

2013-04-10 Thread Jay Vyas
> will not copy files locally. > > -- > 麦树荣 > > *From:* KayVajj > *Date:* 2013-04-11 06:20 > *To:* user@hadoop.apache.org > *Subject:* Copy Vs DistCP > I have few questions regarding the usage of DistCP for copying > files in the same cluster. > > &

Re: Copy Vs DistCP

2013-04-10 Thread 麦树荣
20 To: user@hadoop.apache.org<mailto:user@hadoop.apache.org> Subject: Copy Vs DistCP I have few questions regarding the usage of DistCP for copying files in the same cluster. 1) Which one is better within a same cluster and what factors (like file size etc) wouldinfluence the usage of one

Copy Vs DistCP

2013-04-10 Thread KayVajj
I have few questions regarding the usage of DistCP for copying files in the same cluster. 1) Which one is better within a same cluster and what factors (like file size etc) wouldinfluence the usage of one over te other? 2) when we run a cp command like below from a client node of the cluster (