Hi Nicholas,
Thanks for information on this!
Deepika
-Original Message-
From: Tsz Wo (Nicholas), Sze [mailto:[EMAIL PROTECTED]
Sent: Friday, September 05, 2008 4:13 PM
To: core-user@hadoop.apache.org
Subject: Re: rsync on 2 HDFS
Hi Deepika,
We have a utility called distcp - distributed copy. Note that distcp
itself is different from rsync. However, "distcp -delete" is similar to
"rsync --delete".
"distcp -delete" is a new feature in 0.19. See HADOOP-3939. For more
details about distcp, see
http://hadoop.apache.org/core/docs/r0.18.0/distcp.html
(the doc is for 0.18, so it won't mention "distcp -delete". The 0.19
doc will be updated in HADOOP-3942.)
Nicholas Sze
- Original Message
> From: Deepika Khera <[EMAIL PROTECTED]>
> To: core-user@hadoop.apache.org
> Sent: Friday, September 5, 2008 2:42:09 PM
> Subject: rsync on 2 HDFS
>
> Hi,
>
>
>
> I wanted to do an "rsync --delete" between data in 2 HDFS system
> directories. Do we have a utility that could do this?
>
>
>
> I am aware that HDFS does not allow partial writes. An alternative
would
> be to write a program to generate the list of differences in paths and
> then use distcp to copy the files and delete the appropriate files.
>
>
>
> Any pointers to implementations (or partial implementations)?
>
>
>
> Thanks,
>
> Deepika