Hi Deepika,

We have a utility called distcp - distributed copy.  Note that distcp itself is 
different from rsync.  However, "distcp -delete" is similar to "rsync --delete".

"distcp -delete" is a new feature in 0.19.  See HADOOP-3939.  For more details 
about distcp, see http://hadoop.apache.org/core/docs/r0.18.0/distcp.html
(the doc is for 0.18, so it won't mention "distcp -delete".  The 0.19 doc will 
be updated in HADOOP-3942.)

Nicholas Sze




----- Original Message ----
> From: Deepika Khera <[EMAIL PROTECTED]>
> To: core-user@hadoop.apache.org
> Sent: Friday, September 5, 2008 2:42:09 PM
> Subject: rsync on 2 HDFS
> 
> Hi,
> 
> 
> 
> I wanted to do an "rsync --delete" between data in 2 HDFS system
> directories. Do we have a utility that could do this?
> 
> 
> 
> I am aware that HDFS does not allow partial writes. An alternative would
> be to write a program to generate the list of differences in paths and
> then use distcp to copy the files and delete the appropriate files.
> 
> 
> 
> Any pointers to implementations (or partial implementations)?
> 
> 
> 
> Thanks,
> 
> Deepika

Reply via email to