Repository: oozie
Updated Branches:
  refs/heads/master 8bb40f3fa -> e9dc8e2e5


OOZIE-2920 Document Distcp can copy files within a cluster (Artem Ervits via 
rkanter)


Project: http://git-wip-us.apache.org/repos/asf/oozie/repo
Commit: http://git-wip-us.apache.org/repos/asf/oozie/commit/e9dc8e2e
Tree: http://git-wip-us.apache.org/repos/asf/oozie/tree/e9dc8e2e
Diff: http://git-wip-us.apache.org/repos/asf/oozie/diff/e9dc8e2e

Branch: refs/heads/master
Commit: e9dc8e2e51e8cbab9d3f6a1830a5bf4aee839a71
Parents: 8bb40f3
Author: Robert Kanter <rkan...@apache.org>
Authored: Tue Jun 20 09:46:42 2017 -0700
Committer: Robert Kanter <rkan...@apache.org>
Committed: Tue Jun 20 09:46:42 2017 -0700

----------------------------------------------------------------------
 .../site/twiki/DG_DistCpActionExtension.twiki   | 35 +++++++++++++++++---
 .../src/site/twiki/WorkflowFunctionalSpec.twiki |  7 ++--
 release-log.txt                                 |  1 +
 3 files changed, 37 insertions(+), 6 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/oozie/blob/e9dc8e2e/docs/src/site/twiki/DG_DistCpActionExtension.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/DG_DistCpActionExtension.twiki 
b/docs/src/site/twiki/DG_DistCpActionExtension.twiki
index 9931a04..260cd25 100644
--- a/docs/src/site/twiki/DG_DistCpActionExtension.twiki
+++ b/docs/src/site/twiki/DG_DistCpActionExtension.twiki
@@ -12,7 +12,9 @@
 
 The =DistCp= action uses Hadoop distributed copy to copy files from one 
cluster to another or within the same cluster.
 
-*IMPORTANT:* The DistCp action may not work properly with all configurations 
(secure, insecure) in all versions of Hadoop.
+*IMPORTANT:* The DistCp action may not work properly with all configurations 
(secure, insecure) in all versions
+of Hadoop. For example, distcp between two secure clusters is tested and works 
well. Same is true with two insecure
+clusters. In cases where a secure and insecure clusters are involved, distcp 
will not work.
 
 Both Hadoop clusters have to be configured with proxyuser for the Oozie 
process as explained
 [[DG_QuickStart#HadoopProxyUser][here]] on the Quick Start page.
@@ -22,15 +24,15 @@ Both Hadoop clusters have to be configured with proxyuser 
for the Oozie process
 <verbatim>
 <workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.4">
     ...
-    <action name="[NODE-NAME]">
+    <action name="distcp-example">
         <distcp xmlns="uri:oozie:distcp-action:0.2">
             <job-tracker>${jobTracker}</job-tracker>
             <name-node>${nameNode1}</name-node>
             <arg>${nameNode1}/path/to/input.txt</arg>
             <arg>${nameNode2}/path/to/output.txt</arg>
             </distcp>
-        <ok to="[NODE-NAME]"/>
-        <error to="[NODE-NAME]"/>
+        <ok to="end"/>
+        <error to="fail"/>
     </action>
     ...
 </workflow-app>
@@ -48,6 +50,31 @@ the action:
 </property>
 </verbatim>
 
+The =DistCp= action is also commonly used to copy files within the same 
cluster. Cases where copying files within
+a directory to another directory or directories to target directory is 
supported. Example below will illustrate a
+copy within a cluster, notice the source and target =nameNode= is the same and 
use of =*= syntax is supported to
+represent only child files or directories within a source directory. For the 
sake of the example, =jobTracker= and
+=resourceManager= are synonymous.
+
+*Syntax:*
+
+<verbatim>
+<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.4">
+    ...
+    <action name="copy-example">
+        <distcp xmlns="uri:oozie:distcp-action:0.2">
+            <job-tracker>${resourceManager}</job-tracker>
+            <name-node>${nameNode}</name-node>
+            <arg>${nameNode}/path/to/source/*</arg>
+            <arg>${nameNode}/path/to/target/</arg>
+        </distcp>
+        <ok to="end"/>
+        <error to="fail"/>
+    </action>
+    ...
+</workflow-app>
+</verbatim>
+
 ---++ Appendix, DistCp XML-Schema
 
 ---+++ AE.A Appendix A, DistCp XML-Schema

http://git-wip-us.apache.org/repos/asf/oozie/blob/e9dc8e2e/docs/src/site/twiki/WorkflowFunctionalSpec.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/WorkflowFunctionalSpec.twiki 
b/docs/src/site/twiki/WorkflowFunctionalSpec.twiki
index 5a75b99..6bd3e5a 100644
--- a/docs/src/site/twiki/WorkflowFunctionalSpec.twiki
+++ b/docs/src/site/twiki/WorkflowFunctionalSpec.twiki
@@ -1148,9 +1148,12 @@ Path names specified in the =fs= action can be 
parameterized (templatized) using
 Path name should be specified as a absolute path. In case of =move=, =delete=, 
=chmod= and =chgrp= commands, a glob pattern can also be specified instead of 
an absolute path.
 For =move=, glob pattern can only be specified for source path and not the 
target.
 
-Each file path must specify the file system URI, for move operations, the 
target must not specified the system URI.
+Each file path must specify the file system URI, for move operations, the 
target must not specify the system URI.
 
-IMPORTANT: All the commands within =fs= action do not happen atomically, if a 
=fs= action fails half way in the
+*IMPORTANT:* For the purposes of copying files within a cluster it is 
recommended to refer to the =distcp= action
+instead. Refer to [[DG_DistCpActionExtension][=distcp=]] action to copy files 
within a cluster.
+
+*IMPORTANT:* All the commands within =fs= action do not happen atomically, if 
a =fs= action fails half way in the
 commands being executed, successfully executed commands are not rolled back. 
The =fs= action, before executing any
 command must check that source paths exist and target paths don't exist 
(constraint regarding target relaxed for the =move= action. See below for 
details), thus failing before executing any command.
 Therefore the validity of all paths specified in one =fs= action are evaluated 
before any of the file operation are

http://git-wip-us.apache.org/repos/asf/oozie/blob/e9dc8e2e/release-log.txt
----------------------------------------------------------------------
diff --git a/release-log.txt b/release-log.txt
index cfc94e9..102c292 100644
--- a/release-log.txt
+++ b/release-log.txt
@@ -1,5 +1,6 @@
 -- Oozie 5.0.0 release (trunk - unreleased)
 
+OOZIE-2920 Document Distcp can copy files within a cluster (Artem Ervits via 
rkanter)
 OOZIE-2796 oozie.action.keep.action.dir not getting notice (zgengxb2005 via 
gezapeti)
 OOZIE-2769 Extend FS action to allow setrep on a file (Artem Ervits via 
gezapeti)
 OOZIE-2815 amend - Oozie not always display job log (andras.piros via gezapeti)

Reply via email to