-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63875/
-----------------------------------------------------------

(Updated Nov. 21, 2017, 3:49 p.m.)


Review request for oozie, Peter Bacsko and Robert Kanter.


Repository: oozie-git


Description
-------

Before Oozie on YARN, ``JobSubmitter`` from MapReduce (more precisely 
``TokenCache.obtainTokensForNamenodes``) took care of obtaining delegation 
tokens for HDFS nodes specified by 
``oozie.launcher.mapreduce.job.hdfs-servers`` before submitting the Oozie 
launcher job.

Oozie launcher is now a Yarn Application Master. It needs HDFS delegation 
tokens to be able to copy files between secure clusters via the Oozie DistCp 
action. 

Changes:
- ``JavaActionExecutor`` was modified to handle Distcp related parameters like 
(``oozie.launcher.mapreduce.job.hdfs-servers`` and 
``oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude``)
- ``HDFSCredentials`` was changed to reuse 
``TokenCache.obtainTokensForNamenodes`` to obtain HDFS delegation tokens.


Diffs (updated)
-----

  core/src/main/java/org/apache/oozie/action/hadoop/HDFSCredentials.java 
92a7ebe9a7876b6400d80356d5c826e77575e2ab 
  core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java 
a1df304914b73d406e986409a8053c2a48e1bd38 


Diff: https://reviews.apache.org/r/63875/diff/3/

Changes: https://reviews.apache.org/r/63875/diff/2-3/


Testing
-------

Tested on a secure cluster that Oozie dist cp action can copy file from another 
secure cluster where different Kerberos realm was used.

- workflow:
```
<workflow-app name="DistCp" xmlns="uri:oozie:workflow:0.5">
    <start to="distcp-3a1f"/>
    <kill name="Kill">
        <message>Action failed, error 
message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <action name="distcp-3a1f">
        <distcp xmlns="uri:oozie:distcp-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>

<configuration>
  <property>
    
<name>oozie.launcher.mapreduce.job.dfs.namenode.kerberos.principal.pattern</name>
    <value>*</value>
  </property>
<property>
  <name>oozie.launcher.mapreduce.job.hdfs-servers</name>
  <value>hdfs://oozie.test1.com:8020,hdfs://remote.test2.com:8020</value>
</property>
 
                <property>
                    
<name>oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude</name>
                    <value>remote.test2.com</value>
                </property>
</configuration>
              <arg>hdfs://remote.test2.com:8020/tmp/1</arg>
              <arg>hdfs://oozie.test1.com:8020/tmp/2</arg>
        </distcp>
        <ok to="End"/>
        <error to="Kill"/>
    </action>
    <end name="End"/>
</workflow-app>
```

Prior to executing the workflow I had to setup cross realm trust between the 
test secure clusters. It involved:
- changing Kerberos configuration ``/etc/krb5.conf`` (adding realms and setting 
additional properties like ``udp_preference_limit = 1``)
- regenerating service credentials
- changing HDFS settings (such as ``dfs.namenode.kerberos.principal.pattern``) 
and setting hadoop auth to local rule like ``RULE:[2:$1](.*)s/(.*)/$1/g``
- additional configuration to enable trust between the test hadoop clusters


Thanks,

Attila Sasvari

Reply via email to