The -update behavior is by design.
If I am right, -update is to overwrite the file at the destination
if it
is already there. But, in this case it is overwriting the folder as a
file at destination which seems to be a bug
-update will replace the file if the source and destination sizes
differ.
Copying a single file, particularly with -update, is a corner case for
distcp. It distributes the copy at file granularity, so there is no
advantage to using it in this case. In 0.15, IIRC, -update and -
overwrite control only the "overwrite" actual in the call to create,
which will replace directories when true. Many of these semantic
oddities have been addressed in subsequent versions. The
interpretation of the source and destination paths is slightly
different when either of the two aforementioned options is set, as is
covered in the documentation to be released with 0.18, currently
available in subversion. -C
Could you provide the command line, and the directory structure
before
and after issuing the copy? -C
Cmd is: hadoop distcp -update
'hftp://<srchost>:50070/user/<user>/distcpsrc' distcp_dest
hadoop dfs -lsr distcpsrc
/user/<user>/distcpsrc/1 <dir> 2008-07-24 05:53
/user/<user>/distcpsrc/1/t <r 3> 4 2008-07-22 06:12
hadoop dfs -lsr distcp_dest
/user/<user>/distcp_dest/1 <r 3> 4 2008-07-24 06:03 <<
expected /user/<user>/distcp_dest/1/t, file is copied as '1' instead
of
'1/t'
If I run without '-update', destination dir is:
hadoop dfs -lsr distcp_dest_noupdate
/user/<user>/distcp_dest_noupdate/1 <dir> 2008-07-24
06:08 << file 't' is not copied and '1' is directory
Thanks,
Murali
On Jul 22, 2008, at 9:46 PM, Murali Krishna wrote:
Hi,
I am using 0.15.3 and the destination is empty. One more
behavior that I am seeing is that if I pass '-update' option, it is
writing the content of file '2' in folder 1. (Makes the folder '1'
as
file in the destination). So, look like it is treating the
destination
for file distcpsrc/1/2 as distcpdest/1.
Thanks,
Murali
-----Original Message-----
From: Chris Douglas [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 23, 2008 1:13 AM
To: core-user@hadoop.apache.org
Subject: Re: distcp skipping the file
There were many fixes and improvements to distcp in 0.16, but most
of
the critical fixes made it into 0.15.2 and 0.15.3. Is the
destination
empty? Anything already existing at the destination is skipped. -C
On Jul 22, 2008, at 4:39 AM, Murali Krishna wrote:
Hi,
My source folder has a single folder and a single file inside
that.
/user/<user>/distcpsrc/1/2 <r 3> 4 2008-07-22 04:22
In the destination, it is creating the folder '1' but not the file
'2'.
The counters show 1 file has been skipped.
08/07/22 04:22:36 INFO mapred.JobClient: Files skipped=1
If I create one more file in any directory under the distscpsrc
folder,
it copies both the files properly. Is this a bug?
[I am using 15.3]
Thanks,
Murali