The -update behavior is by design.

If I am right, -update is to overwrite the file at the destination if it
is already there. But, in this case it is overwriting the folder as a
file at destination which seems to be a bug


-update will replace the file if the source and destination sizes differ.

Copying a single file, particularly with -update, is a corner case for distcp. It distributes the copy at file granularity, so there is no advantage to using it in this case. In 0.15, IIRC, -update and - overwrite control only the "overwrite" actual in the call to create, which will replace directories when true. Many of these semantic oddities have been addressed in subsequent versions. The interpretation of the source and destination paths is slightly different when either of the two aforementioned options is set, as is covered in the documentation to be released with 0.18, currently available in subversion. -C





Could you provide the command line, and the directory structure before

and after issuing the copy? -C



Cmd is: hadoop distcp -update
'hftp://<srchost>:50070/user/<user>/distcpsrc' distcp_dest



hadoop dfs -lsr distcpsrc

/user/<user>/distcpsrc/1 <dir>           2008-07-24 05:53

/user/<user>/distcpsrc/1/t       <r 3>   4       2008-07-22 06:12



hadoop dfs -lsr  distcp_dest

/user/<user>/distcp_dest/1       <r 3>   4       2008-07-24 06:03 <<
expected /user/<user>/distcp_dest/1/t, file is copied as '1' instead of
'1/t'



If I run without '-update', destination dir is:

hadoop dfs -lsr  distcp_dest_noupdate

/user/<user>/distcp_dest_noupdate/1      <dir>           2008-07-24
06:08 << file 't' is not copied and '1' is directory



Thanks,

Murali





On Jul 22, 2008, at 9:46 PM, Murali Krishna wrote:



Hi,

 I am using 0.15.3 and the destination is empty. One more

behavior that I am seeing is that if I pass '-update' option, it is

writing the content of file '2' in folder 1. (Makes the folder '1'
as

file in the destination). So, look like it is treating the
destination

for file distcpsrc/1/2 as distcpdest/1.



Thanks,

Murali



-----Original Message-----

From: Chris Douglas [mailto:[EMAIL PROTECTED]

Sent: Wednesday, July 23, 2008 1:13 AM

To: core-user@hadoop.apache.org

Subject: Re: distcp skipping the file



There were many fixes and improvements to distcp in 0.16, but most
of

the critical fixes made it into 0.15.2 and 0.15.3. Is the
destination

empty? Anything already existing at the destination is skipped. -C



On Jul 22, 2008, at 4:39 AM, Murali Krishna wrote:



Hi,



My source folder has a single folder and a single file inside
that.



/user/<user>/distcpsrc/1/2 <r 3>   4       2008-07-22 04:22



In the destination, it is creating the folder '1' but not the file

'2'.



The counters show 1 file has been skipped.



08/07/22 04:22:36 INFO mapred.JobClient:     Files skipped=1







If I create one more file in any directory under the distscpsrc

folder,

it copies both the files properly. Is this a bug?



[I am using 15.3]







Thanks,



Murali








Reply via email to