[ https://issues.apache.org/jira/browse/HDFS-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15510391#comment-15510391 ]
Yongjun Zhang edited comment on HDFS-10314 at 9/21/16 4:11 PM: --------------------------------------------------------------- Hi [~jingzhao], For clarity, and as a recap, here is a comparison table between -diff and the proposed -rdiff, which shows the symmetricity: ||Comparison||-diff s1 s2 <src> <tgt>||-rdiff s2 s1 <src> <tgt>|| |Current feature state|Existing in distcp|Proposed Addition | |Functionality| Given <tgt>'s current state is s1, make <tgt>'s current state the same as newer snapshot s2 | Given <tgt>'s current state is s2, make <tgt>'s current state the same as older snapshot s1 | |Requirements| # <src> and <tgt> need to be different paths # both <src> and <tgt> have snapshot s1 with exact same content # <src> has snapshot s2 # s2 is newer than s1 # <tgt>'s current state is the same as s1 # <tgt> doesn't have snapshot s2 | # <src> and <tgt> can be the same or different paths # both <src> and <tgt> have snapshot s1 with exact same content # <tgt> has snapshot s2 # s2 is newer than s1 # <tgt>'s current state is the same as s2 # <src> may or may not have snapshot s2 | |Steps|# calculate snapshotDiff<s1,s2> at <src> # apply rename/delete part of snapshotDiff on <tgt> # copy modified part of snapshotDiff from s2 of <src> to <tgt> | # calculate snapshotDiff<s2,s1> at <tgt> # apply rename/delete part of snapshotDiff on <tgt> # copy modified part of snapshotDiff from s1 of <src> to <tgt> | The original thinking was to add -ridff to distcp (solution A), but because of the concern of confusing semantics, it's suggested to introduce a new command here (solution B). Thanks. was (Author: yzhangal): Hi [~jingzhao], For clarity, and as a recap, here is a comparison table between -diff and the proposed -rdiff, which shows the symmetricity: ||Comparison||-diff s1 s2 <src> <tgt>||-rdiff s2 s1 <src> <tgt>|| |Current feature state|Existing in distcp|Proposed Addition | |Functionality| Given <tgt>'s current state is s1, make <tgt>'s current state the same as newer snapshot s2 | Given <tgt>'s current state is s2, make <tgt>'s current state the same as older snapshot s1 | |Requirements| # <src> and <tgt> need to be different paths # both <src> and <tgt> have snapshot s1 with exact same content # <src> has snapshot s2 # s2 is newer than s1 # <tgt>'s current state is the same as s1 # <tgt> doesn't have snapshot s2 | # <src> and <tgt> can be the same or different paths # both <src> and <tgt> have snapshot s1 with exact same content # <tgt> has snapshot s2 # s2 is newer than s1 # <tgt>'s current state is the same as s2 # <src> may or may not have snapshot s2 | |Steps|# calculate snapshotDiff<s1,s2> at <src> # apply rename/delete part of snapshotDiff on <tgt> # copy modified part of snapshotDiff from s1 of <src> to <tgt> | # calculate snapshotDiff<s2,s1> at <tgt> # apply rename/delete part of snapshotDiff on <tgt> # copy modified part of snapshotDiff from s1 of <src> to <tgt> | The original thinking was to add -ridff to distcp (solution A), but because of the concern of confusing semantics, it's suggested to introduce a new command here (solution B). Thanks. > A new tool to sync current HDFS view to specified snapshot > ---------------------------------------------------------- > > Key: HDFS-10314 > URL: https://issues.apache.org/jira/browse/HDFS-10314 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools > Reporter: Yongjun Zhang > Assignee: Yongjun Zhang > Attachments: HDFS-10314.001.patch > > > HDFS-9820 proposed adding -rdiff switch to distcp, as a reversed operation of > -diff switch. > Upon discussion with [~jingzhao], we will introduce a new tool that wraps > around distcp to achieve the same purpose. > I'm thinking about calling the new tool "rsync", similar to unix/linux > command "rsync". The "r" here means remote. > The syntax that simulate -rdiff behavior proposed in HDFS-9820 is > {code} > rsync <fromSnapshotName> <toSnapshotName> <source> <target> > {code} > This command ensure <fromSnapshotName> is newer than <toSnapshotName>. > I think, In the future, we can add another command to have the functionality > of -diff switch of distcp. > {code} > sync <fromSnapshotName> <toSnapshotName> <source> <target> > {code} > that ensures <fromSnapshotName> is older than <toSnapshotName>. > Thanks [~jingzhao]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org