RE: rsync on 2 HDFS

2008-09-09 Thread Deepika Khera
Hi Nicholas,

Thanks for information on this!

Deepika

-Original Message-
From: Tsz Wo (Nicholas), Sze [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 05, 2008 4:13 PM
To: core-user@hadoop.apache.org
Subject: Re: rsync on 2 HDFS

Hi Deepika,

We have a utility called distcp - distributed copy.  Note that distcp
itself is different from rsync.  However, "distcp -delete" is similar to
"rsync --delete".

"distcp -delete" is a new feature in 0.19.  See HADOOP-3939.  For more
details about distcp, see
http://hadoop.apache.org/core/docs/r0.18.0/distcp.html
(the doc is for 0.18, so it won't mention "distcp -delete".  The 0.19
doc will be updated in HADOOP-3942.)

Nicholas Sze




- Original Message 
> From: Deepika Khera <[EMAIL PROTECTED]>
> To: core-user@hadoop.apache.org
> Sent: Friday, September 5, 2008 2:42:09 PM
> Subject: rsync on 2 HDFS
> 
> Hi,
> 
> 
> 
> I wanted to do an "rsync --delete" between data in 2 HDFS system
> directories. Do we have a utility that could do this?
> 
> 
> 
> I am aware that HDFS does not allow partial writes. An alternative
would
> be to write a program to generate the list of differences in paths and
> then use distcp to copy the files and delete the appropriate files.
> 
> 
> 
> Any pointers to implementations (or partial implementations)?
> 
> 
> 
> Thanks,
> 
> Deepika



Re: rsync on 2 HDFS

2008-09-05 Thread Tsz Wo (Nicholas), Sze
Hi Deepika,

We have a utility called distcp - distributed copy.  Note that distcp itself is 
different from rsync.  However, "distcp -delete" is similar to "rsync --delete".

"distcp -delete" is a new feature in 0.19.  See HADOOP-3939.  For more details 
about distcp, see http://hadoop.apache.org/core/docs/r0.18.0/distcp.html
(the doc is for 0.18, so it won't mention "distcp -delete".  The 0.19 doc will 
be updated in HADOOP-3942.)

Nicholas Sze




- Original Message 
> From: Deepika Khera <[EMAIL PROTECTED]>
> To: core-user@hadoop.apache.org
> Sent: Friday, September 5, 2008 2:42:09 PM
> Subject: rsync on 2 HDFS
> 
> Hi,
> 
> 
> 
> I wanted to do an "rsync --delete" between data in 2 HDFS system
> directories. Do we have a utility that could do this?
> 
> 
> 
> I am aware that HDFS does not allow partial writes. An alternative would
> be to write a program to generate the list of differences in paths and
> then use distcp to copy the files and delete the appropriate files.
> 
> 
> 
> Any pointers to implementations (or partial implementations)?
> 
> 
> 
> Thanks,
> 
> Deepika