Hi Deepika, We have a utility called distcp - distributed copy. Note that distcp itself is different from rsync. However, "distcp -delete" is similar to "rsync --delete".
"distcp -delete" is a new feature in 0.19. See HADOOP-3939. For more details about distcp, see http://hadoop.apache.org/core/docs/r0.18.0/distcp.html (the doc is for 0.18, so it won't mention "distcp -delete". The 0.19 doc will be updated in HADOOP-3942.) Nicholas Sze ----- Original Message ---- > From: Deepika Khera <[EMAIL PROTECTED]> > To: core-user@hadoop.apache.org > Sent: Friday, September 5, 2008 2:42:09 PM > Subject: rsync on 2 HDFS > > Hi, > > > > I wanted to do an "rsync --delete" between data in 2 HDFS system > directories. Do we have a utility that could do this? > > > > I am aware that HDFS does not allow partial writes. An alternative would > be to write a program to generate the list of differences in paths and > then use distcp to copy the files and delete the appropriate files. > > > > Any pointers to implementations (or partial implementations)? > > > > Thanks, > > Deepika