Added: oozie/site/trunk/content/resources/docs/5.2.1/AG_Install.html
URL: 
http://svn.apache.org/viewvc/oozie/site/trunk/content/resources/docs/5.2.1/AG_Install.html?rev=1886952&view=auto
==============================================================================
--- oozie/site/trunk/content/resources/docs/5.2.1/AG_Install.html (added)
+++ oozie/site/trunk/content/resources/docs/5.2.1/AG_Install.html Fri Feb 26 
14:14:19 2021
@@ -0,0 +1,1405 @@
+<!DOCTYPE html>
+<!--
+ | Generated by Apache Maven Doxia at 2021-02-26 
+ | Rendered using Apache Maven Fluido Skin 1.4
+-->
+<html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <meta name="Date-Revision-yyyymmdd" content="20210226" />
+    <meta http-equiv="Content-Language" content="en" />
+    <title>Oozie &#x2013; </title>
+    <link rel="stylesheet" href="./css/apache-maven-fluido-1.4.min.css" />
+    <link rel="stylesheet" href="./css/site.css" />
+    <link rel="stylesheet" href="./css/print.css" media="print" />
+
+      
+    <script type="text/javascript" 
src="./js/apache-maven-fluido-1.4.min.js"></script>
+
+    
+                  </head>
+        <body class="topBarDisabled">
+          
+        
+    
+        <div class="container-fluid">
+          <div id="banner">
+        <div class="pull-left">
+                                    <a href="https://oozie.apache.org/"; 
id="bannerLeft">
+                                                                               
         <img src="https://oozie.apache.org/images/oozie_200x.png";  
alt="Oozie"/>
+                </a>
+                      </div>
+        <div class="pull-right">  </div>
+        <div class="clear"><hr/></div>
+      </div>
+
+      <div id="breadcrumbs">
+        <ul class="breadcrumb">
+                
+                    
+                              <li class="">
+                    <a href="http://www.apache.org/"; class="externalLink" 
title="Apache">
+        Apache</a>
+                    <span class="divider">/</span>
+      </li>
+            <li class="">
+                    <a href="../../" title="Oozie">
+        Oozie</a>
+                    <span class="divider">/</span>
+      </li>
+            <li class="">
+                    <a href="../" title="docs">
+        docs</a>
+                    <span class="divider">/</span>
+      </li>
+                <li class="">
+                    <a href="./" title="5.2.1">
+        5.2.1</a>
+                    <span class="divider">/</span>
+      </li>
+        <li class="active "></li>
+        
+                
+                    
+                  <li id="publishDate" class="pull-right"><span 
class="divider">|</span> Last Published: 2021-02-26</li>
+              <li id="projectVersion" class="pull-right">
+                    Version: 5.2.1
+        </li>
+            
+                            </ul>
+      </div>
+
+            
+      <div class="row-fluid">
+        <div id="leftColumn" class="span2">
+          <div class="well sidebar-nav">
+                
+                    
+                <ul class="nav nav-list">
+  </ul>
+                
+                    
+                
+          <hr />
+
+           <div id="poweredBy">
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                             <a href="http://maven.apache.org/"; title="Built 
by Maven" class="poweredBy">
+        <img class="builtBy" alt="Built by Maven" 
src="./images/logos/maven-feather.png" />
+      </a>
+                  </div>
+          </div>
+        </div>
+        
+                
+        <div id="bodyColumn"  class="span10" >
+                                  
+            <p><a href="index.html">::Go back to Oozie Documentation 
Index::</a></p>
+<h1>Oozie Installation and Configuration</h1>
+<ul>
+<li><a href="#Basic_Setup">Basic Setup</a></li>
+<li><a href="#Environment_Setup">Environment Setup</a></li>
+<li><a href="#Oozie_Server_Setup">Oozie Server Setup</a>
+<ul>
+<li><a href="#Setting_Up_Oozie_with_an_Alternate_Tomcat">Setting Up Oozie with 
an Alternate Tomcat</a></li></ul></li>
+<li><a href="#Database_Configuration">Database Configuration</a></li>
+<li><a href="#Database_Migration">Database Migration</a></li>
+<li><a href="#Oozie_Configuration">Oozie Configuration</a>
+<ul>
+<li><a href="#Oozie_Configuration_Properties">Oozie Configuration 
Properties</a></li>
+<li><a href="#Precedence_of_Configuration_Properties">Precedence of 
Configuration Properties</a>
+<ul>
+<li><a href="#Overriding_Configuration_Values">Overriding Configuration 
Values</a></li>
+<li><a href="#Prepending_Configuration_Values">Prepending Configuration 
Values</a></li></ul></li>
+<li><a href="#Logging_Configuration">Logging Configuration</a></li>
+<li><a href="#Oozie_User_Authentication_Configuration">Oozie User 
Authentication Configuration</a></li>
+<li><a href="#Oozie_Hadoop_Authentication_Configuration">Oozie Hadoop 
Authentication Configuration</a></li>
+<li><a href="#User_ProxyUser_Configuration">User ProxyUser 
Configuration</a></li>
+<li><a href="#User_Authorization_Configuration">User Authorization 
Configuration</a>
+<ul>
+<li><a href="#Defining_Admin_Users">Defining Admin Users</a></li>
+<li><a href="#Defining_Access_Control_Lists">Defining Access Control 
Lists</a></li></ul></li>
+<li><a href="#Oozie_System_ID_Configuration">Oozie System ID 
Configuration</a></li>
+<li><a href="#Filesystem_Configuration">Filesystem Configuration</a></li>
+<li><a href="#HCatalog_Configuration">HCatalog Configuration</a></li>
+<li><a href="#Notifications_Configuration">Notifications Configuration</a></li>
+<li><a href="#Setting_Up_Oozie_with_HTTPS_SSL">Setting Up Oozie with HTTPS 
(SSL)</a>
+<ul>
+<li><a href="#To_use_a_Self-Signed_Certificate">To use a Self-Signed 
Certificate</a></li>
+<li><a href="#To_use_a_Certificate_from_a_Certificate_Authority">To use a 
Certificate from a Certificate Authority</a></li>
+<li><a href="#Configure_the_Oozie_Server_to_use_SSL_HTTPS">Configure the Oozie 
Server to use SSL (HTTPS)</a></li>
+<li><a href="#Configure_the_Oozie_Client_to_connect_using_SSL_HTTPS">Configure 
the Oozie Client to connect using SSL (HTTPS)</a></li>
+<li><a href="#Connect_to_the_Oozie_Web_UI_using_SSL_HTTPS">Connect to the 
Oozie Web UI using SSL (HTTPS)</a></li>
+<li><a href="#Additional_considerations_for_Oozie_HA_with_SSL">Additional 
considerations for Oozie HA with SSL</a></li></ul></li>
+<li><a href="#Fine_Tuning_an_Oozie_Server">Fine Tuning an Oozie Server</a></li>
+<li><a href="#Using_Instrumentation_instead_of_Metrics">Using Instrumentation 
instead of Metrics</a></li>
+<li><a href="#High_Availability_HA">High Availability (HA)</a>
+<ul>
+<li><a href="#Pre-requisites">Pre-requisites</a></li>
+<li><a href="#InstallationConfiguration_Steps">Installation/Configuration 
Steps</a></li>
+<li><a href="#Security">Security</a></li>
+<li><a href="#JobId_sequence">JobId sequence</a></li></ul></li></ul></li>
+<li><a href="#Starting_and_Stopping_Oozie">Starting and Stopping Oozie</a></li>
+<li><a href="#Oozie_Command_Line_Installation">Oozie Command Line 
Installation</a></li>
+<li><a href="#Oozie_Share_Lib">Oozie Share Lib</a></li>
+<li><a href="#Oozie_CoordinatorsBundles_Processing_Timezone">Oozie 
Coordinators/Bundles Processing Timezone</a></li>
+<li><a href="#MapReduce_Workflow_Uber_Jars">MapReduce Workflow Uber 
Jars</a></li>
+<li><a href="#AdvancedCustom_Environment_Settings">Advanced/Custom Environment 
Settings</a></li>
+<li><a href="#Oozie_behind_a_trusted_proxy">Oozie behind a trusted 
proxy</a></li></ul>
+
+<div class="section">
+<h2><a name="Basic_Setup"></a>Basic Setup</h2>
+<p>Follow the instructions at <a href="DG_QuickStart.html">Oozie Quick 
Start</a>.</p></div>
+<div class="section">
+<h2><a name="Environment_Setup"></a>Environment Setup</h2>
+<p><b>IMPORTANT:</b> Oozie ignores any set value for <tt>OOZIE_HOME</tt>, 
Oozie computes its home automatically.</p>
+<p>When running Oozie with its embedded Jetty server, the 
<tt>conf/oozie-env.sh</tt> file can be used to configure the following 
environment variables used by Oozie:</p>
+<p><b>JETTY_OPTS</b> : settings for the Embedded Jetty that runs Oozie. Java 
System properties for Oozie should be specified in this variable. No default 
value.</p>
+<p><b>OOZIE_CONFIG_FILE</b> : Oozie configuration file to load from Oozie 
configuration directory. Default value <tt>oozie-site.xml</tt>.</p>
+<p><b>OOZIE_LOGS</b> : Oozie logs directory. Default value <tt>logs/</tt> 
directory in the Oozie installation directory.</p>
+<p><b>OOZIE_LOG4J_FILE</b> :  Oozie Log4J configuration file to load from 
Oozie configuration directory. Default value 
<tt>oozie-log4j.properties</tt>.</p>
+<p><b>OOZIE_LOG4J_RELOAD</b> : Reload interval of the Log4J configuration 
file, in seconds. Default value <tt>10</tt></p>
+<p><b>OOZIE_CHECK_OWNER</b> : If set to <tt>true</tt>, Oozie 
setup/start/run/stop scripts will check that the owner of the Oozie 
installation directory matches the user invoking the script. The default value 
is undefined and interpreted as a <tt>false</tt>.</p>
+<p><b>OOZIE_INSTANCE_ID</b> : The instance id of the Oozie server.  When using 
HA, each server instance should have a unique instance id. Default value 
<tt>${OOZIE_HTTP_HOSTNAME}</tt></p></div>
+<div class="section">
+<h2><a name="Oozie_Server_Setup"></a>Oozie Server Setup</h2>
+<p>The <tt>oozie-setup.sh</tt> script prepares the embedded Jetty server to 
run Oozie.</p>
+<p>The <tt>oozie-setup.sh</tt> script options are:</p>
+
+<div>
+<div>
+<pre class="source">Usage  : oozie-setup.sh &lt;Command and OPTIONS&gt;
+          sharelib create -fs FS_URI [-locallib SHARED_LIBRARY] [-extralib 
EXTRA_SHARED_LIBRARY] [-concurrency CONCURRENCY]
+                                                                (create 
sharelib for oozie,
+                                                                FS_URI is the 
fs.default.name
+                                                                for hdfs uri; 
SHARED_LIBRARY, path to the
+                                                                Oozie sharelib 
to install, it can be a tarball
+                                                                or an expanded 
version of it. If omitted,
+                                                                the Oozie 
sharelib tarball from the Oozie
+                                                                installation 
directory will be used.
+                                                                
EXTRA_SHARED_LIBRARY represents extra sharelib resources.
+                                                                This option 
requires a pair of sharelibname
+                                                                and 
comma-separated list of pathnames in the following format:
+                                                                
sharelib-name=path-name-1,path-name-2
+                                                                In case of 
more than one sharelib, this option can be specified
+                                                                multiple times.
+                                                                CONCURRENCY is 
a number of threads to be used
+                                                                for copy 
operations.
+                                                                By default 1 
thread will be used)
+                                                                (action fails 
if sharelib is already installed
+                                                                in HDFS)
+          sharelib upgrade -fs FS_URI [-locallib SHARED_LIBRARY] 
([deprecated][use create command to create new version]
+                                                                  upgrade 
existing sharelib, fails if there
+                                                                  is no 
existing sharelib installed in HDFS)
+          db create|upgrade|postupgrade -run [-sqlfile &lt;FILE&gt;] (create, 
upgrade or postupgrade oozie db with an
+                                                                optional sql 
File)
+          export &lt;file&gt;                                         exports 
the oozie database to the specified
+                                                                file in zip 
format
+          import &lt;file&gt;                                         imports 
the oozie database from the zip file
+                                                                created by 
export
+          (without options prints this usage information)
+</pre></div></div>
+
+<p>If a directory <tt>libext/</tt> is present in Oozie installation directory, 
the <tt>oozie-setup.sh</tt> script will include all JARs in Jetty&#x2019;s 
<tt>webapp/WEB_INF/lib/</tt> directory.</p>
+<p>If the ExtJS ZIP file is present in the <tt>libext/</tt> directory, it will 
be added to the Jetty&#x2019;s <tt>webapp/</tt> directory as well. The ExtJS 
library file name be <tt>ext-2.2.zip</tt>.</p>
+<div class="section">
+<h3><a name="Setting_Up_Oozie_with_an_Alternate_Tomcat"></a>Setting Up Oozie 
with an Alternate Tomcat</h3>
+<p>Use the <tt>addtowar.sh</tt> script to prepare the Oozie server only if 
Oozie will run with a different servlet  container than the embedded Jetty 
provided with the distribution.</p>
+<p>The <tt>addtowar.sh</tt> script adds Hadoop JARs, JDBC JARs and the ExtJS 
library to the Oozie WAR file.</p>
+<p>The <tt>addtowar.sh</tt> script options are:</p>
+
+<div>
+<div>
+<pre class="source"> Usage  : addtowar &lt;OPTIONS&gt;
+ Options: -inputwar INPUT_OOZIE_WAR
+          -outputwar OUTPUT_OOZIE_WAR
+          [-hadoop HADOOP_VERSION HADOOP_PATH]
+          [-extjs EXTJS_PATH]
+          [-jars JARS_PATH] (multiple JAR path separated by ':')
+          [-secureWeb WEB_XML_PATH] (path to secure web.xml)
+</pre></div></div>
+
+<p>The original <tt>oozie.war</tt> file is in the Oozie server installation 
directory.</p>
+<p>After the Hadoop JARs and the ExtJS library has been added to the 
<tt>oozie.war</tt> file Oozie is ready to run.</p>
+<p>Delete any previous deployment of the <tt>oozie.war</tt> from the servlet 
container (if using Tomcat, delete <tt>oozie.war</tt> and <tt>oozie</tt> 
directory from Tomcat&#x2019;s <tt>webapps/</tt> directory)</p>
+<p>Deploy the prepared <tt>oozie.war</tt> file (the one that contains the 
Hadoop JARs and the ExtJS library) in the servlet container (if using Tomcat, 
copy the prepared <tt>oozie.war</tt> file to Tomcat&#x2019;s <tt>webapps/</tt> 
directory).</p>
+<p><b>IMPORTANT:</b> Only one Oozie instance can be deployed per Tomcat 
instance.</p></div></div>
+<div class="section">
+<h2><a name="Database_Configuration"></a>Database Configuration</h2>
+<p>Oozie works with HSQL, Derby, MySQL, Oracle, PostgreSQL or SQL Server 
databases.</p>
+<p>By default, Oozie is configured to use Embedded Derby.</p>
+<p>Oozie bundles the JDBC drivers for HSQL, Embedded Derby and PostgreSQL.</p>
+<p>HSQL is normally used for test cases as it is an in-memory database and all 
data is lost every time Oozie is stopped.</p>
+<p>If using Derby, MySQL, Oracle, PostgreSQL, or SQL Server, the Oozie 
database schema must be created using the <tt>ooziedb.sh</tt> command line 
tool.</p>
+<p>If using MySQL, Oracle, or SQL Server, the corresponding JDBC driver JAR 
file must be copied to Oozie&#x2019;s <tt>libext/</tt> directory and it must be 
added to Oozie WAR file using the <tt>bin/addtowar.sh</tt> or the 
<tt>oozie-setup.sh</tt> scripts using the <tt>-jars</tt> option.</p>
+<p><b>IMPORTANT:</b> It is recommended to set the database&#x2019;s timezone 
to GMT (consult your database&#x2019;s documentation on how to do this). 
Databases don&#x2019;t handle Daylight Saving Time shifts correctly, and may 
cause problems if you run any Coordinators with actions scheduled to 
materialize during the 1 hour period where we &#x201c;fall back&#x201d;.  For 
Derby, you can add &#x2018;-Duser.timezone=GMT&#x2019; to <tt>JETTY_OPTS</tt> 
in oozie-env.sh to set this.  Alternatively, if using MySQL, you can have Oozie 
use GMT with MySQL without setting MySQL&#x2019;s timezone to GMT by adding 
&#x2018;useLegacyDatetimeCode=false&amp;serverTimezone=GMT&#x2019; arguments to 
the JDBC URL, <tt>oozie.service.JPAService.jdbc.url</tt>.  Be advised that 
changing the timezone on an existing Oozie database while Coordinators are 
already running may cause Coordinators to shift by the offset of their timezone 
from GMT once after making this change.</p>
+<p>The SQL database used by Oozie is configured using the following 
configuration properties (default values shown):</p>
+
+<div>
+<div>
+<pre class="source">  oozie.db.schema.name=oozie
+  oozie.service.JPAService.create.db.schema=false
+  oozie.service.JPAService.validate.db.connection=false
+  oozie.service.JPAService.jdbc.driver=org.apache.derby.jdbc.EmbeddedDriver
+  
oozie.service.JPAService.jdbc.url=jdbc:derby:${oozie.data.dir}/${oozie.db.schema.name}-db;create=true
+  oozie.service.JPAService.jdbc.username=sa
+  oozie.service.JPAService.jdbc.password=
+  oozie.service.JPAService.pool.max.active.conn=10
+</pre></div></div>
+
+<p><b>NOTE:</b> If the <tt>oozie.db.schema.create</tt> property is set to 
<tt>true</tt> (default value is <tt>false</tt>) the Oozie tables will be 
created automatically without having to use the <tt>ooziedb</tt> command line 
tool. Setting this property to <tt>true</tt> it is recommended only for 
development.</p>
+<p><b>NOTE:</b> If the <tt>oozie.db.schema.create</tt> property is set to 
true, the <tt>oozie.service.JPAService.validate.db.connection</tt> property 
value is ignored and Oozie handles it as set to <tt>false</tt>.</p>
+<p>Once <tt>oozie-site.xml</tt> has been configured with the database 
configuration execute the <tt>ooziedb.sh</tt> command line tool to create the 
database:</p>
+
+<div>
+<div>
+<pre class="source">$ bin/ooziedb.sh create -sqlfile oozie.sql -run
+
+Validate DB Connection.
+DONE
+Check DB schema does not exist
+DONE
+Check OOZIE_SYS table does not exist
+DONE
+Create SQL schema
+DONE
+DONE
+Create OOZIE_SYS table
+DONE
+
+Oozie DB has been created for Oozie version '3.2.0'
+
+The SQL commands have been written to: oozie.sql
+
+$
+</pre></div></div>
+
+<p>NOTE: If using MySQL, Oracle, or SQL Server, copy the corresponding JDBC 
driver JAR file to the <tt>libext/</tt> directory before running the 
<tt>ooziedb.sh</tt> command line tool.</p>
+<p>NOTE: If instead using the &#x2018;-run&#x2019; option, the <tt>-sqlfile 
&lt;FILE&gt;</tt> option is used, then all the database changes will be written 
to the specified file and the database won&#x2019;t be modified.</p>
+<p>If using HSQL there is no need to use the <tt>ooziedb</tt> command line 
tool as HSQL is an in-memory database. Use the following configuration 
properties in the oozie-site.xml:</p>
+
+<div>
+<div>
+<pre class="source">  oozie.db.schema.name=oozie
+  oozie.service.JPAService.create.db.schema=true
+  oozie.service.JPAService.validate.db.connection=false
+  oozie.service.JPAService.jdbc.driver=org.hsqldb.jdbcDriver
+  oozie.service.JPAService.jdbc.url=jdbc:hsqldb:mem:${oozie.db.schema.name}
+  oozie.service.JPAService.jdbc.username=sa
+  oozie.service.JPAService.jdbc.password=
+  oozie.service.JPAService.pool.max.active.conn=10
+</pre></div></div>
+
+<p>If you are interested in fine tuning how Oozie can retry database 
operations on failing database connectivity or errors, you can set following 
properties to other values. Here are the default ones:</p>
+
+<div>
+<div>
+<pre class="source">  oozie.service.JPAService.retry.initial-wait-time.ms=100
+  oozie.service.JPAService.retry.maximum-wait-time.ms=30000
+  oozie.service.JPAService.retry.max-retries=10
+</pre></div></div>
+
+<p>If you set either <tt>oozie.service.JPAService.retry.max-retries</tt> or 
<tt>oozie.service.JPAService.retry.maximum-wait-time.ms</tt> to <tt>0</tt>, no 
retry attempts will be made on any database connectivity issues. Exact settings 
for these properties depend also on how much load is on Oozie regarding 
workflow and coordinator jobs.</p>
+<p>The database operation retry functionality kicks in when there is a 
<tt>javax.persistence.PersistenceException</tt> those root cause is not part of 
the normal everyday operation - filtered against a blacklist consisting of 
descendants like <tt>NoSuchResultException</tt>, 
<tt>NonUniqueResultException</tt>, and the like. This way Oozie won&#x2019;t 
retry database operations on errors that are more related to the current query, 
or otherwise part of the everyday life. This way it&#x2019;s ensured that this 
blacklist is database agnostic.</p>
+<p>It has been tested with a MySQL / failing every minute 10 seconds / an 
Oozie coordinator job of an Oozie workflow consisting of four workflow actions 
(some of them are asynchronous). On this setup Oozie was recovering after each 
and every database outage.</p>
+<p>To set up such a failing MySQL scenario following has to be performed:</p>
+<ul>
+
+<li>Set <tt>oozie.service.JPAService.connection.data.source</tt> to 
<tt>org.apache.oozie.util.db.BasicDataSourceWrapper</tt> within 
<tt>oozie-site.xml</tt></li>
+<li>Set <tt>oozie.service.JPAService.jdbc.driver</tt> to 
<tt>org.apache.oozie.util.db.FailingMySQLDriverWrapper</tt> within 
<tt>oozie-site.xml</tt></li>
+<li>Restart Oozie server</li>
+<li>Submit / start some workflows, coordinators etc.</li>
+<li>See how Oozie is retrying on injected database errors by looking at the 
Oozie server logs, grepping <tt>JPAException</tt> instances with following 
message prefix:  <tt>Deliberately failing to prepare statement.</tt></li>
+</ul></div>
+<div class="section">
+<h2><a name="Database_Migration"></a>Database Migration</h2>
+<p>Oozie provides an easy way to switch between databases without losing any 
data. Oozie servers should be stopped during the database migration process. 
The export of the database can be done using the following command:</p>
+
+<div>
+<div>
+<pre class="source">$ bin/oozie-setup.sh export /tmp/oozie_db.zip
+1 rows exported from OOZIE_SYS
+50 rows exported from WF_JOBS
+340 rows exported from WF_ACTIONS
+10 rows exported from COORD_JOBS
+70 rows exported from COORD_ACTIONS
+0 rows exported from BUNDLE_JOBS
+0 rows exported from BUNDLE_ACTIONS
+0 rows exported from SLA_REGISTRATION
+0 rows exported from SLA_SUMMARY
+</pre></div></div>
+
+<p>The database configuration is read from <tt>oozie-site.xml</tt>. After 
updating the configuration to point to the new database, the tables have to be 
created with ooziedb.sh in the <a 
href="AG_Install.html#Database_Configuration">Database configuration</a> 
section above. Once the tables are created, they can be filled with data using 
the following command:</p>
+
+<div>
+<div>
+<pre class="source">$ bin/oozie-setup.sh import /tmp/oozie_db.zip
+Loading to Oozie database version 3
+50 rows imported to WF_JOBS
+340 rows imported to WF_ACTIONS
+10 rows imported to COORD_JOBS
+70 rows imported to COORD_ACTIONS
+0 rows imported to BUNDLE_JOBS
+0 rows imported to BUNDLE_ACTIONS
+0 rows imported to SLA_REGISTRATION
+0 rows imported to SLA_SUMMARY
+</pre></div></div>
+
+<p>NOTE: The database version of the zip must match the version of the Oozie 
database it&#x2019;s imported to.</p>
+<p>After starting the Oozie server, the history and the currently running 
workflows should be available.</p>
+<p><b>IMPORTANT:</b> The tool was primarily developed to make the migration 
from embedded databases (e.g. Derby) to standalone databases (e.g. MySQL, 
PosgreSQL, Oracle, MS SQL Server), though it will work between any supported 
databases. It is <b>not</b> optimized to handle databases over 1 Gb. If the 
database size is larger, it should be purged before migration.</p></div>
+<div class="section">
+<h2><a name="Oozie_Configuration"></a>Oozie Configuration</h2>
+<p>By default, Oozie configuration is read from Oozie&#x2019;s <tt>conf/</tt> 
directory</p>
+<p>The Oozie configuration is distributed in 3 different files:</p>
+<ul>
+
+<li><tt>oozie-site.xml</tt> : Oozie server configuration</li>
+<li><tt>oozie-log4j.properties</tt> : Oozie logging configuration</li>
+<li><tt>adminusers.txt</tt> : Oozie admin users list</li>
+</ul>
+<div class="section">
+<h3><a name="Oozie_Configuration_Properties"></a>Oozie Configuration 
Properties</h3>
+<p>All Oozie configuration properties and their default values are defined in 
the <tt>oozie-default.xml</tt> file.</p>
+<p>Oozie resolves configuration property values in the following order:</p>
+<ul>
+
+<li>If a Java System property is defined, it uses its value</li>
+<li>Else, if the Oozie configuration file (<tt>oozie-site.xml</tt>) contains 
the property, it uses its value</li>
+<li>Else, it uses the default value documented in the 
<tt>oozie-default.xml</tt> file</li>
+</ul>
+<p><b>NOTE:</b> The <tt>oozie-default.xml</tt> file found in Oozie&#x2019;s 
<tt>conf/</tt> directory is not used by Oozie, it is there for reference 
purposes only.</p></div>
+<div class="section">
+<h3><a name="Precedence_of_Configuration_Properties"></a>Precedence of 
Configuration Properties</h3>
+<p>For compatibility reasons across Hadoop / Oozie versions, some 
configuration properties can be defined using multiple keys in the launcher 
configuration. Beginning with Oozie 5.0.0, some of them can be overridden, some 
others will be prepended to default configuration values.</p>
+<div class="section">
+<h4><a name="Overriding_Configuration_Values"></a>Overriding Configuration 
Values</h4>
+<p>Overriding happens for following configuration entries with 
<tt>oozie.launcher</tt> prefix, by switching <tt>oozie.launcher.override</tt> 
(on by default).</p>
+<p>For those, following is the general approach:</p>
+<ul>
+
+<li>check whether a YARN compatible entry is present. If yes, use it to 
override default value</li>
+<li>check whether a MapReduce v2 compatible entry is present. If yes, use it 
to override default value</li>
+<li>check whether a MapReduce v1 compatible entry is present. If yes, use it 
to override default value</li>
+<li>use default value</li>
+</ul>
+<p>Such properties are (legend: YARN / MapReduce v2 / MapReduce v1):</p>
+<ul>
+
+<li>max attempts of the MapReduce Application Master:
+<ul>
+
+<li>N / A</li>
+<li><tt>mapreduce.map.maxattempts</tt></li>
+<li><tt>mapred.map.max.attempts</tt></li>
+</ul>
+</li>
+<li>memory amount in MB of the MapReduce Application Master:
+<ul>
+
+<li><tt>yarn.app.mapreduce.am.resource.mb</tt></li>
+<li><tt>mapreduce.map.memory.mb</tt></li>
+<li><tt>mapred.job.map.memory.mb</tt></li>
+</ul>
+</li>
+<li>CPU vcore count of the MapReduce Application Master:
+<ul>
+
+<li><tt>yarn.app.mapreduce.am.resource.cpu-vcores</tt></li>
+<li><tt>mapreduce.map.cpu.vcores</tt></li>
+<li>N / A</li>
+</ul>
+</li>
+<li>logging level of the MapReduce Application Master:
+<ul>
+
+<li>N / A</li>
+<li><tt>mapreduce.map.log.level</tt></li>
+<li><tt>mapred.map.child.log.level</tt></li>
+</ul>
+</li>
+<li>MapReduce Application Master JVM options:
+<ul>
+
+<li><tt>yarn.app.mapreduce.am.command-opts</tt></li>
+<li><tt>mapreduce.map.java.opts</tt></li>
+<li><tt>mapred.child.java.opts</tt></li>
+</ul>
+</li>
+<li>MapReduce Application Master environment variable settings:
+<ul>
+
+<li><tt>yarn.app.mapreduce.am.env</tt></li>
+<li><tt>mapreduce.map.env</tt></li>
+<li><tt>mapred.child.env</tt></li>
+</ul>
+</li>
+<li>MapReduce Application Master job priority:
+<ul>
+
+<li>N / A</li>
+<li><tt>mapreduce.job.priority</tt></li>
+<li><tt>mapred.job.priority</tt></li>
+</ul>
+</li>
+<li>MapReduce Application Master job queue name:
+<ul>
+
+<li>N / A</li>
+<li><tt>mapreduce.job.queuename</tt></li>
+<li><tt>mapred.job.queue.name</tt></li>
+</ul>
+</li>
+<li>MapReduce View ACL settings:
+<ul>
+
+<li>N / A</li>
+<li><tt>mapreduce.job.acl-view-job</tt></li>
+<li>N / A</li>
+</ul>
+</li>
+<li>MapReduce Modify ACL settings:
+<ul>
+
+<li>N / A</li>
+<li><tt>mapreduce.job.acl-modify-job</tt></li>
+<li>N / A</li>
+</ul>
+</li>
+</ul>
+<p>This list can be extended or modified by adding new configuration entries 
or updating existing values beginning with <tt>oozie.launcher.override.</tt> 
within <tt>oozie-site.xml</tt>. Examples can be found in 
<tt>oozie-default.xml</tt>.</p></div>
+<div class="section">
+<h4><a name="Prepending_Configuration_Values"></a>Prepending Configuration 
Values</h4>
+<p>Prepending happens for following configuration entries with 
<tt>oozie.launcher</tt> prefix, by switching <tt>oozie.launcher.prepend</tt> 
(on by default).</p>
+<p>For those, following is the general approach:</p>
+<ul>
+
+<li>check whether a YARN compatible entry is present. If yes, use it to 
prepend to default value</li>
+<li>use default value</li>
+</ul>
+<p>Such properties are (legend: YARN only):</p>
+<ul>
+
+<li>MapReduce Application Master JVM options: 
<tt>yarn.app.mapreduce.am.admin-command-opts</tt></li>
+<li>MapReduce Application Master environment settings: 
<tt>yarn.app.mapreduce.am.admin.user.env</tt></li>
+</ul>
+<p>This list can be extended or modified by adding new configuration entries 
or updating existing values beginning with <tt>oozie.launcher.prepend.</tt> 
within <tt>oozie-site.xml</tt>. Examples can be found in 
<tt>oozie-default.xml</tt>.</p></div></div>
+<div class="section">
+<h3><a name="Logging_Configuration"></a>Logging Configuration</h3>
+<p>By default, Oozie log configuration is defined in the 
<tt>oozie-log4j.properties</tt> configuration file.</p>
+<p>If the Oozie log configuration file changes, Oozie reloads the new settings 
automatically.</p>
+<p>By default, Oozie logs to Oozie&#x2019;s <tt>logs/</tt> directory.</p>
+<p>Oozie logs in 4 different files:</p>
+<ul>
+
+<li>oozie.log: web services log streaming works from this log</li>
+<li>oozie-ops.log: messages for Admin/Operations to monitor</li>
+<li>oozie-instrumentation.log: instrumentation data, every 60 seconds 
(configurable)</li>
+<li>oozie-audit.log: audit messages, workflow jobs changes</li>
+</ul>
+<p>The embedded Jetty and embedded Derby log files are also written to 
Oozie&#x2019;s <tt>logs/</tt> directory.</p></div>
+<div class="section">
+<h3><a name="Oozie_User_Authentication_Configuration"></a>Oozie User 
Authentication Configuration</h3>
+<p>Oozie supports Kerberos HTTP SPNEGO authentication, pseudo/simple 
authentication and anonymous access for client connections.</p>
+<p>Anonymous access (<b>default</b>) does not require the user to authenticate 
and the user ID is obtained from the job properties on job submission 
operations, other operations are anonymous.</p>
+<p>Pseudo/simple authentication requires the user to specify the user name on 
the request, this is done by the PseudoAuthenticator class by injecting the 
<tt>user.name</tt> parameter in the query string of all requests. The 
<tt>user.name</tt> parameter value is taken from the client process Java System 
property <tt>user.name</tt>.</p>
+<p>Kerberos HTTP SPNEGO authentication requires the user to perform a Kerberos 
HTTP SPNEGO authentication sequence.</p>
+<p>If Pseudo/simple or Kerberos HTTP SPNEGO authentication mechanisms are 
used, Oozie will return the user an authentication token HTTP Cookie that can 
be used in later requests as identity proof.</p>
+<p>Oozie uses Apache Hadoop-Auth (Java HTTP SPNEGO) library for 
authentication. This library can be extended to support other authentication 
mechanisms.</p>
+<p>Oozie user authentication is configured using the following configuration 
properties (default values shown):</p>
+
+<div>
+<div>
+<pre class="source">  oozie.authentication.type=simple
+  oozie.authentication.token.validity=36000
+  oozie.authentication.signature.secret=
+  oozie.authentication.cookie.domain=
+  oozie.authentication.simple.anonymous.allowed=true
+  oozie.authentication.kerberos.principal=HTTP/localhost@${local.realm}
+  
oozie.authentication.kerberos.keytab=${oozie.service.HadoopAccessorService.keytab.file}
+</pre></div></div>
+
+<p>The <tt>type</tt> defines authentication used for Oozie HTTP endpoint, the 
supported values are: simple | kerberos | 
#AUTHENTICATION_HANDLER_CLASSNAME#.</p>
+<p>The <tt>token.validity</tt> indicates how long (in seconds) an 
authentication token is valid before it has to be renewed.</p>
+<p>The <tt>signature.secret</tt> is the signature secret for signing the 
authentication tokens. It is recommended to not set this, in which case Oozie 
will randomly generate one on startup.</p>
+<p>The <tt>oozie.authentication.cookie.domain</tt> The domain to use for the 
HTTP cookie that stores the authentication token. In order to authentication to 
work correctly across all Hadoop nodes web-consoles the domain must be 
correctly set.</p>
+<p>The <tt>simple.anonymous.allowed</tt> indicates if anonymous requests are 
allowed. This setting is meaningful only when using &#x2018;simple&#x2019; 
authentication.</p>
+<p>The <tt>kerberos.principal</tt> indicates the Kerberos principal to be used 
for HTTP endpoint. The principal MUST start with &#x2018;HTTP/&#x2019; as per 
Kerberos HTTP SPNEGO specification.</p>
+<p>The <tt>kerberos.keytab</tt> indicates the location of the keytab file with 
the credentials for the principal. It should be the same keytab file Oozie uses 
for its Kerberos credentials for Hadoop.</p></div>
+<div class="section">
+<h3><a name="Oozie_Hadoop_Authentication_Configuration"></a>Oozie Hadoop 
Authentication Configuration</h3>
+<p>Oozie works with Hadoop versions which support Kerberos authentication.</p>
+<p>Oozie Hadoop authentication is configured using the following configuration 
properties (default values shown):</p>
+
+<div>
+<div>
+<pre class="source">  
oozie.service.HadoopAccessorService.kerberos.enabled=false
+  local.realm=LOCALHOST
+  oozie.service.HadoopAccessorService.keytab.file=${user.home}/oozie.keytab
+  
oozie.service.HadoopAccessorService.kerberos.principal=${user.name}/localhost@{local.realm}
+</pre></div></div>
+
+<p>The above default values are for a Hadoop 0.20 secure distribution (with 
support for Kerberos authentication).</p>
+<p>To enable Kerberos authentication, the following property must be set:</p>
+
+<div>
+<div>
+<pre class="source">  oozie.service.HadoopAccessorService.kerberos.enabled=true
+</pre></div></div>
+
+<p>When using Kerberos authentication, the following properties must be set to 
the correct values (default values shown):</p>
+
+<div>
+<div>
+<pre class="source">  local.realm=LOCALHOST
+  oozie.service.HadoopAccessorService.keytab.file=${user.home}/oozie.keytab
+  
oozie.service.HadoopAccessorService.kerberos.principal=${user.name}/localhost@{local.realm}
+</pre></div></div>
+
+<p><b>IMPORTANT:</b> When using Oozie with a Hadoop 20 with Security 
distribution, the Oozie user in Hadoop must be configured as a proxy 
user.</p></div>
+<div class="section">
+<h3><a name="User_ProxyUser_Configuration"></a>User ProxyUser 
Configuration</h3>
+<p>Oozie supports impersonation or proxyuser functionality (identical to 
Hadoop proxyuser capabilities and conceptually similar to Unix 
&#x2018;sudo&#x2019;).</p>
+<p>Proxyuser enables other systems that are Oozie clients to submit jobs on 
behalf of other users.</p>
+<p>Because proxyuser is a powerful capability, Oozie provides the following 
restriction capabilities (similar to Hadoop):</p>
+<ul>
+
+<li>Proxyuser is an explicit configuration on per proxyuser user basis.</li>
+<li>A proxyuser user can be restricted to impersonate other users from a set 
of hosts.</li>
+<li>A proxyuser user can be restricted to impersonate users belonging to a set 
of groups.</li>
+</ul>
+<p>There are 2 configuration properties needed to set up a proxyuser:</p>
+<ul>
+
+<li>oozie.service.ProxyUserService.proxyuser.#USER#.hosts: hosts from where 
the user #USER# can impersonate other users.</li>
+<li>oozie.service.ProxyUserService.proxyuser.#USER#.groups: groups the users 
being impersonated by user #USER# must belong to.</li>
+</ul>
+<p>Both properties support the &#x2018;*&#x2019; wildcard as value. Although 
this is recommended only for testing/development.</p></div>
+<div class="section">
+<h3><a name="User_Authorization_Configuration"></a>User Authorization 
Configuration</h3>
+<p>Oozie has a basic authorization model:</p>
+<ul>
+
+<li>Users have read access to all jobs</li>
+<li>Users have write access to their own jobs</li>
+<li>Users have write access to jobs based on an Access Control List (list of 
users and groups)</li>
+<li>Users have read access to admin operations</li>
+<li>Admin users have write access to all jobs</li>
+<li>Admin users have write access to admin operations</li>
+</ul>
+<p>If security is disabled all users are admin users.</p>
+<p>Oozie security is set via the following configuration property (default 
value shown):</p>
+
+<div>
+<div>
+<pre class="source">  oozie.service.AuthorizationService.security.enabled=false
+</pre></div></div>
+
+<p>NOTE: the old ACL model where a group was provided is still supported if 
the following property is set in <tt>oozie-site.xml</tt>:</p>
+
+<div>
+<div>
+<pre class="source">  
oozie.service.AuthorizationService.default.group.as.acl=true
+</pre></div></div>
+
+<div class="section">
+<h4><a name="Defining_Admin_Users"></a>Defining Admin Users</h4>
+<p>Admin users are determined from the list of admin groups, specified in 
<tt>oozie.service.AuthorizationService.admin.groups</tt> property. Use commas 
to separate multiple groups, spaces, tabs and ENTER characters are trimmed.</p>
+<p>If the above property for admin groups is not set, then defining the admin 
users can happen in the following manners. The list of admin users can be in 
the <tt>conf/adminusers.txt</tt> file. The syntax of this file is:</p>
+<ul>
+
+<li>One user name per line</li>
+<li>Empty lines and lines starting with &#x2018;#&#x2019; are ignored</li>
+</ul>
+<p>Admin users can also be defined in 
<tt>oozie.serviceAuthorizationService.admin.users</tt> property. Use commas to 
separate multiple admin users, spaces, tabs and ENTER characters are 
trimmed.</p>
+<p>In case there are admin users defined using both methods, the effective 
list of admin users will be the union of the admin users found in the 
adminusers.txt and those specified with 
<tt>oozie.serviceAuthorizationService.admin.users</tt>.</p></div>
+<div class="section">
+<h4><a name="Defining_Access_Control_Lists"></a>Defining Access Control 
Lists</h4>
+<p>Access Control Lists are defined in the following ways:</p>
+<ul>
+
+<li>workflow job submission over CLI: configuration property 
<tt>group.name</tt> of <tt>job.properties</tt></li>
+<li>workflow job submission over HTTP: configuration property 
<tt>group.name</tt> of the XML submitted over HTTP</li>
+<li>workflow job re-run: configuration property <tt>oozie.job.acl</tt> 
(preferred) or configuration property <tt>group.name</tt> of 
<tt>job.properties</tt></li>
+<li>coordinator job submission over CLI: configuration property 
<tt>oozie.job.acl</tt> (preferred) or configuration property 
<tt>group.name</tt> of <tt>job.properties</tt></li>
+<li>bundle job submission over CLI: configuration property 
<tt>oozie.job.acl</tt> (preferred) or configuration property 
<tt>group.name</tt> of <tt>job.properties</tt></li>
+</ul>
+<p>For all other workflow, coordinator, or bundle actions the ACL set in 
beforehand will be used as basis.</p>
+<p>Once the ACL for the job is defined, Oozie will check over HDFS whether the 
user trying to perform a specific action is part of the necessary group(s). For 
implementation details please check out 
<tt>org.apache.hadoop.security.Groups#getGroups(String user)</tt>.</p>
+<p>Note that it&#x2019;s enough that the submitting user be part of at least 
one group of the ACL. Note also that the ACL can contain user names as well. If 
there is an ACL defined and the submitting user isn&#x2019;t part of any group 
or user name present in the ACL, an <tt>AuthorizationException</tt> is 
thrown.</p>
+<p><b>Example: A typical ACL setup</b></p>
+<p>Detail of <tt>job.properties</tt> on workflow job submission:</p>
+
+<div>
+<div>
+<pre class="source">user.name=joe
+group.name=marketing,admin,qa,root
+</pre></div></div>
+
+<p>HDFS group membership of HDFS user <tt>joe</tt> is <tt>qa</tt>. That is, 
the check to 
<tt>org.apache.hadoop.security.Groups#getGroups(&quot;joe&quot;)</tt> returns 
<tt>qa</tt>. Hence, ACL check will pass inside <tt>AuthorizationService</tt>, 
because the <tt>user.name</tt> provided belongs to at least of the ACL list 
elements provided as <tt>group.name</tt>.</p></div></div>
+<div class="section">
+<h3><a name="Oozie_System_ID_Configuration"></a>Oozie System ID 
Configuration</h3>
+<p>Oozie has a system ID that is is used to generate the Oozie temporary 
runtime directory, the workflow job IDs, and the workflow action IDs.</p>
+<p>Two Oozie systems running with the same ID will not have any conflict but 
in case of troubleshooting it will be easier to identify resources created/used 
by the different Oozie systems if they have different system IDs (default value 
shown):</p>
+
+<div>
+<div>
+<pre class="source">  oozie.system.id=oozie-${user.name}
+</pre></div></div>
+</div>
+<div class="section">
+<h3><a name="Filesystem_Configuration"></a>Filesystem Configuration</h3>
+<p>Oozie lets you to configure the allowed Filesystems by using the following 
configuration property in oozie-site.xml:</p>
+
+<div>
+<div>
+<pre class="source">  &lt;property&gt;
+    
&lt;name&gt;oozie.service.HadoopAccessorService.supported.filesystems&lt;/name&gt;
+    &lt;value&gt;hdfs&lt;/value&gt;
+  &lt;/property&gt;
+</pre></div></div>
+
+<p>The above value, <tt>hdfs</tt>, which is the default, means that Oozie will 
only allow HDFS filesystems to be used.  Examples of other filesystems that 
Oozie is compatible with are: hdfs, hftp, webhdfs, and viewfs.  Multiple 
filesystems can be specified as comma-separated values.  Putting a * will allow 
any filesystem type, effectively disabling this check.</p></div>
+<div class="section">
+<h3><a name="HCatalog_Configuration"></a>HCatalog Configuration</h3>
+<p>Refer to the <a href="DG_HCatalogIntegration.html">Oozie HCatalog 
Integration</a> document for a overview of HCatalog and integration of Oozie 
with HCatalog. This section explains the various settings to be configured in 
oozie-site.xml on the Oozie server to enable Oozie to work with HCatalog.</p>
+<p><b>Adding HCatalog jars to Oozie war:</b></p>
+<p>For Oozie server to talk to HCatalog server, HCatalog and hive jars need to 
be in the server classpath. hive-site.xml which has the configuration to talk 
to the HCatalog server also needs to be in the classpath or specified by the 
following configuration property in oozie-site.xml:</p>
+
+<div>
+<div>
+<pre class="source">  &lt;property&gt;
+    
&lt;name&gt;oozie.service.HCatAccessorService.hcat.configuration&lt;/name&gt;
+    &lt;value&gt;/local/filesystem/path/to/hive-site.xml&lt;/value&gt;
+  &lt;/property&gt;
+</pre></div></div>
+
+<p>The hive-site.xml can also be placed in a location on HDFS and the above 
property can have a value of <tt>hdfs://HOST:PORT/path/to/hive-site.xml</tt> to 
point there instead of the local file system.</p>
+<p>The oozie-[version]-hcataloglibs.tar.gz in the oozie distribution bundles 
the required hcatalog and hive jars that needs to be placed in the Oozie server 
classpath. If using a version of HCatalog bundled in Oozie hcataloglibs/, copy 
the corresponding HCatalog jars from hcataloglibs/ to the libext/ directory. If 
using a different version of HCatalog, copy the required HCatalog jars from 
such version in the libext/ directory. This needs to be done before running the 
<tt>oozie-setup.sh</tt> script so that these jars get added for Oozie.</p>
+<p><b>Configure HCatalog URI Handling:</b></p>
+
+<div>
+<div>
+<pre class="source">  &lt;property&gt;
+    &lt;name&gt;oozie.service.URIHandlerService.uri.handlers&lt;/name&gt;
+    
&lt;value&gt;org.apache.oozie.dependency.FSURIHandler,org.apache.oozie.dependency.HCatURIHandler&lt;/value&gt;
+    &lt;description&gt;
+        Enlist the different uri handlers supported for data availability 
checks.
+    &lt;/description&gt;
+  &lt;/property&gt;
+</pre></div></div>
+
+<p>The above configuration defines the different uri handlers which check for 
existence of data dependencies defined in a Coordinator. The default value is 
<tt>org.apache.oozie.dependency.FSURIHandler</tt>. FSURIHandler supports uris 
with schemes defined in the configuration 
<tt>oozie.service.HadoopAccessorService.supported.filesystems</tt> which are 
hdfs, hftp and webhcat by default. HCatURIHandler supports uris with the scheme 
as hcat.</p>
+<p><b>Configure HCatalog services:</b></p>
+
+<div>
+<div>
+<pre class="source">  &lt;property&gt;
+    &lt;name&gt;oozie.services.ext&lt;/name&gt;
+    &lt;value&gt;
+        org.apache.oozie.service.JMSAccessorService,
+        org.apache.oozie.service.PartitionDependencyManagerService,
+        org.apache.oozie.service.HCatAccessorService
+      &lt;/value&gt;
+    &lt;description&gt;
+          To add/replace services defined in 'oozie.services' with custom 
implementations.
+          Class names must be separated by commas.
+    &lt;/description&gt;
+  &lt;/property&gt;
+</pre></div></div>
+
+<p>PartitionDependencyManagerService and HCatAccessorService are required to 
work with HCatalog and support Coordinators having HCatalog uris as data 
dependency. If the HCatalog server is configured to publish partition 
availability notifications to a JMS compliant messaging provider like ActiveMQ, 
then JMSAccessorService needs to be added to <tt>oozie.services.ext</tt> to 
handle those notifications.</p>
+<p><b>Configure JMS Provider JNDI connection mapping for HCatalog:</b></p>
+
+<div>
+<div>
+<pre class="source">  &lt;property&gt;
+    &lt;name&gt;oozie.service.HCatAccessorService.jmsconnections&lt;/name&gt;
+    &lt;value&gt;
+      
hcat://hcatserver.colo1.com:8020=java.naming.factory.initial#Dummy.Factory;java.naming.provider.url#tcp://broker.colo1.com:61616,
+      
default=java.naming.factory.initial#org.apache.activemq.jndi.ActiveMQInitialContextFactory;java.naming.provider.url#tcp://broker.colo.com:61616;connectionFactoryNames#ConnectionFactory
+    &lt;/value&gt;
+    &lt;description&gt;
+        Specify the map  of endpoints to JMS configuration properties. In 
general, endpoint
+        identifies the HCatalog server URL. &quot;default&quot; is used if no 
endpoint is mentioned
+        in the query. If some JMS property is not defined, the system will use 
the property
+        defined jndi.properties. jndi.properties files is retrieved from the 
application classpath.
+        Mapping rules can also be provided for mapping Hcatalog servers to 
corresponding JMS providers.
+        
hcat://${1}.${2}.com:8020=java.naming.factory.initial#Dummy.Factory;java.naming.provider.url#tcp://broker.${2}.com:61616
+    &lt;/description&gt;
+  &lt;/property&gt;
+</pre></div></div>
+
+<p>Currently HCatalog does not provide APIs to get the connection details to 
connect to the JMS Provider it publishes notifications to. It only has APIs 
which provide the topic name in the JMS Provider to which the notifications are 
published for a given database table. So the JMS Provider&#x2019;s connection 
properties needs to be manually configured in Oozie using the above setting. 
You can either provide a <tt>default</tt> JNDI configuration which will be used 
as the JMS Provider for all HCatalog servers, or can specify a configuration 
per HCatalog server URL or provide a configuration based on a rule matching 
multiple HCatalog server URLs. For example: With the configuration of 
<tt>hcat://${1}.${2}.com:8020=java.naming.factory.initial#Dummy.Factory;java.naming.provider.url#tcp://broker.${2}.com:61616</tt>,
 request URL of <tt>hcat://server1.colo1.com:8020</tt>will map 
to<tt>tcp://broker.colo1.com:61616</tt>, <tt>hcat://server2.colo2.com:8020</tt> 
will map to<tt>tcp://broker.colo2
 .com:61616</tt> and so on.</p>
+<p><b>Configure HCatalog Polling Frequency:</b></p>
+
+<div>
+<div>
+<pre class="source">  &lt;property&gt;
+    &lt;name&gt;oozie.service.coord.push.check.requeue.interval
+        &lt;/name&gt;
+    &lt;value&gt;600000&lt;/value&gt;
+    &lt;description&gt;Command re-queue interval for push dependencies (in 
millisecond).
+    &lt;/description&gt;
+  &lt;/property&gt;
+</pre></div></div>
+
+<p>If there is no JMS Provider configured for a HCatalog Server, then oozie 
polls HCatalog based on the frequency defined in 
<tt>oozie.service.coord.input.check.requeue.interval</tt>. This config also 
applies to HDFS polling. If there is a JMS provider configured for a HCatalog 
Server, then oozie polls HCatalog based on the frequency defined in 
<tt>oozie.service.coord.push.check.requeue.interval</tt> as a fallback. The 
defaults for <tt>oozie.service.coord.input.check.requeue.interval</tt> and 
<tt>oozie.service.coord.push.check.requeue.interval</tt> are 1 minute and 10 
minutes respectively.</p></div>
+<div class="section">
+<h3><a name="Notifications_Configuration"></a>Notifications Configuration</h3>
+<p>Oozie supports publishing notifications to a JMS Provider for job status 
changes and SLA met and miss events. For more information on the feature, refer 
<a href="DG_JMSNotifications.html">JMS Notifications</a> documentation. Oozie 
can also send email notifications on SLA misses.</p>
+<ul>
+
+<li>
+
+<p><b>Message Broker Installation</b>: <br /> For Oozie to send/receive 
messages, a JMS-compliant broker should be installed. Apache ActiveMQ is a 
popular JMS-compliant broker usable for this purpose. See <a 
class="externalLink" 
href="http://activemq.apache.org/getting-started.html";>here</a> for 
instructions on installing and running ActiveMQ.</p>
+</li>
+<li>
+
+<p><b>Services</b>: <br /> Add/modify <tt>oozie.services.ext</tt> property in 
<tt>oozie-site.xml</tt> to include the following services.</p>
+</li>
+</ul>
+
+<div>
+<div>
+<pre class="source">     &lt;property&gt;
+        &lt;name&gt;oozie.services.ext&lt;/name&gt;
+        &lt;value&gt;
+            org.apache.oozie.service.JMSAccessorService,
+            org.apache.oozie.service.JMSTopicService,
+            org.apache.oozie.service.EventHandlerService,
+            org.apache.oozie.sla.service.SLAService
+        &lt;/value&gt;
+     &lt;/property&gt;
+</pre></div></div>
+
+<ul>
+
+<li><b>Event Handlers</b>: <br /></li>
+</ul>
+
+<div>
+<div>
+<pre class="source">     &lt;property&gt;
+        
&lt;name&gt;oozie.service.EventHandlerService.event.listeners&lt;/name&gt;
+        &lt;value&gt;
+            org.apache.oozie.jms.JMSJobEventListener,
+            org.apache.oozie.sla.listener.SLAJobEventListener,
+            org.apache.oozie.jms.JMSSLAEventListener,
+            org.apache.oozie.sla.listener.SLAEmailEventListener
+        &lt;/value&gt;
+     &lt;/property&gt;
+</pre></div></div>
+
+<p>It is also recommended to increase 
<tt>oozie.service.SchedulerService.threads</tt> to 15 for faster event 
processing and sending notifications. The services and their functions are as 
follows: <br /> JMSJobEventListener - Sends JMS job notifications <br /> 
JMSSLAEventListener - Sends JMS SLA notifications <br /> SLAEmailEventListener 
- Sends Email SLA notifications <br /> SLAJobEventListener - Processes job 
events and calculates SLA. Does not send any notifications</p>
+<ul>
+
+<li><b>JMS properties</b>:  <br /> Add 
<tt>oozie.jms.producer.connection.properties</tt> property in 
<tt>oozie-site.xml</tt>. Its value corresponds to an identifier (e.g. default) 
assigned to a semi-colon separated key#value list of properties from your JMS 
broker&#x2019;s <tt>jndi.properties</tt> file. The important properties are 
<tt>java.naming.factory.initial</tt> and <tt>java.naming.provider.url</tt>.</li>
+</ul>
+<p>As an example, if using ActiveMQ in local env, the property can be set 
to</p>
+
+<div>
+<div>
+<pre class="source">     &lt;property&gt;
+        &lt;name&gt;oozie.jms.producer.connection.properties&lt;/name&gt;
+        &lt;value&gt;
+            
java.naming.factory.initial#org.apache.activemq.jndi.ActiveMQInitialContextFactory;java.naming.provider.url#tcp://localhost:61616;connectionFactoryNames#ConnectionFactory
+        &lt;/value&gt;
+     &lt;/property&gt;
+</pre></div></div>
+
+<ul>
+
+<li><b>JMS Topic name</b>: <br /> JMS consumers listen on a particular 
&#x201c;topic&#x201d;. Hence Oozie needs to define a topic variable with which 
to publish messages about the various jobs.</li>
+</ul>
+
+<div>
+<div>
+<pre class="source">     &lt;property&gt;
+        &lt;name&gt;oozie.service.JMSTopicService.topic.name&lt;/name&gt;
+        &lt;value&gt;
+            default=${username}
+        &lt;/value&gt;
+        &lt;description&gt;
+            Topic options are ${username}, ${jobId}, or a fixed string which 
can be specified as default or for a
+            particular job type.
+            For e.g To have a fixed string topic for workflows, coordinators 
and bundles,
+            specify in the following comma-separated format: 
{jobtype1}={some_string1}, {jobtype2}={some_string2}
+            where job type can be WORKFLOW, COORDINATOR or BUNDLE.
+            Following example defines topic for workflow job, workflow action, 
coordinator job, coordinator action,
+            bundle job and bundle action
+            WORKFLOW=workflow,
+            COORDINATOR=coordinator,
+            BUNDLE=bundle
+            For jobs with no defined topic, default topic will be ${username}
+        &lt;/description&gt;
+     &lt;/property&gt;
+</pre></div></div>
+
+<p>Another related property is the topic prefix.</p>
+
+<div>
+<div>
+<pre class="source">     &lt;property&gt;
+        &lt;name&gt;oozie.service.JMSTopicService.topic.prefix&lt;/name&gt;
+        &lt;value&gt;&lt;/value&gt;
+        &lt;description&gt;
+            This can be used to append a prefix to the topic in 
oozie.service.JMSTopicService.topic.name. For eg: oozie.
+        &lt;/description&gt;
+     &lt;/property&gt;
+</pre></div></div>
+</div>
+<div class="section">
+<h3><a name="Setting_Up_Oozie_with_HTTPS_SSL"></a>Setting Up Oozie with HTTPS 
(SSL)</h3>
+<p><b>IMPORTANT</b>: The default HTTPS configuration will cause all Oozie URLs 
to use HTTPS except for the JobTracker callback URLs. This is to simplify 
configuration (no changes needed outside of Oozie), but this is okay because 
Oozie doesn&#x2019;t inherently trust the callbacks anyway; they are used as 
hints.</p>
+<p>The related environment variables are explained at <a 
href="AG_Install.html#Environment_Setup">Environment Setup</a>.</p>
+<p>You can use either a certificate from a Certificate Authority or a 
Self-Signed Certificate.  Using a self-signed certificate requires some 
additional configuration on each Oozie client machine.  If possible, a 
certificate from a Certificate Authority is recommended because it&#x2019;s 
simpler to configure.</p>
+<p>There&#x2019;s also some additional considerations when using Oozie HA with 
HTTPS.</p>
+<div class="section">
+<h4><a name="To_use_a_Self-Signed_Certificate"></a>To use a Self-Signed 
Certificate</h4>
+<p>There are many ways to create a Self-Signed Certificate, this is just one 
way.  We will be using the <a class="externalLink" 
href="http://docs.oracle.com/javase/6/docs/technotes/tools/solaris/keytool.html";>keytool</a>
 program, which is included with your JRE. If it&#x2019;s not on your path, you 
should be able to find it in $JAVA_HOME/bin.</p>
+<ol style="list-style-type: decimal">
+
+<li>
+
+<p>Run the following command (as the Oozie user) to create the keystore file, 
which will be named <tt>.keystore</tt> and located in the Oozie user&#x2019;s 
home directory.</p>
+
+<div>
+<div>
+<pre class="source">keytool -genkeypair -alias jetty -keyalg RSA -dname 
&quot;CN=hostname&quot; -storepass password -keypass password
+</pre></div></div>
+
+<p>The <tt>hostname</tt> should be the host name of the Oozie Server or a 
wildcard on the subdomain it belongs to.  Make sure to include the 
&#x201c;CN=&#x201d; part.  You can change <tt>storepass</tt> and 
<tt>keypass</tt> values, but they should be the same.  If you do want to use 
something other than password, you&#x2019;ll also need to change the value of 
the <tt>oozie.https.keystore.pass</tt> property in <tt>oozie-site.xml</tt> to 
match; <tt>password</tt> is the default.</p>
+<p>For example, if your Oozie server was at oozie.int.example.com, then you 
would do this:</p>
+
+<div>
+<div>
+<pre class="source">keytool -genkeypair -alias jetty -keyalg RSA -dname 
&quot;CN=oozie.int.example.com&quot; -storepass password -keypass password
+</pre></div></div>
+
+<p>If you&#x2019;re going to be using Oozie HA, it&#x2019;s simplest if you 
have a single certificate that all Oozie servers in the HA group can use. To do 
that, you&#x2019;ll need to use a wildcard on the subdomain it belongs to:</p>
+
+<div>
+<div>
+<pre class="source">keytool -genkeypair -alias jetty -keyalg RSA -dname 
&quot;CN=*.int.example.com&quot; -storepass password -keypass password
+</pre></div></div>
+
+<p>The above would work on any server in the int.example.com domain.</p>
+</li>
+<li>
+
+<p>Run the following command (as the Oozie user) to export a certificate file 
from the keystore file:</p>
+
+<div>
+<div>
+<pre class="source">keytool -exportcert -alias jetty -file 
path/to/anywhere/certificate.cert -storepass password
+</pre></div></div>
+</li>
+<li>
+
+<p>Run the following command (as any user) to create a truststore containing 
the certificate we just exported:</p>
+
+<div>
+<div>
+<pre class="source">keytool -import -alias jetty -file 
path/to/certificate.cert -keystore /path/to/anywhere/oozie.truststore 
-storepass password2
+</pre></div></div>
+
+<p>You&#x2019;ll need the <tt>oozie.truststore</tt> later if you&#x2019;re 
using the Oozie client (or other Java-based client); otherwise, you can skip 
this step.  The <tt>storepass</tt> value here is only used to verify or change 
the truststore and isn&#x2019;t typically required when only reading from it; 
so it does not have to be given to users only using the client.</p>
+</li>
+</ol></div>
+<div class="section">
+<h4><a name="To_use_a_Certificate_from_a_Certificate_Authority"></a>To use a 
Certificate from a Certificate Authority</h4>
+<ol style="list-style-type: decimal">
+
+<li>
+
+<p>You will need to make a request to a Certificate Authority in order to 
obtain a proper Certificate; please consult a Certificate Authority on this 
procedure.  If you&#x2019;re going to be using Oozie HA, it&#x2019;s simplest 
if you have a single certificate that all Oozie servers in the HA group can 
use.  To do that, you&#x2019;ll need to use a wild on the subdomain it belongs 
to (e.g. &#x201c;*.int.example.com&#x201d;).</p>
+</li>
+<li>
+
+<p>Once you have your .cert file, run the following command (as the Oozie 
user) to create a keystore file from your certificate:</p>
+
+<div>
+<div>
+<pre class="source">keytool -import -alias jetty -file path/to/certificate.cert
+</pre></div></div>
+
+<p>The keystore file will be named <tt>.keystore</tt> and located in the Oozie 
user&#x2019;s home directory.</p>
+</li>
+</ol></div>
+<div class="section">
+<h4><a name="Configure_the_Oozie_Server_to_use_SSL_HTTPS"></a>Configure the 
Oozie Server to use SSL (HTTPS)</h4>
+<ol style="list-style-type: decimal">
+
+<li>
+
+<p>Make sure the Oozie server isn&#x2019;t running</p>
+</li>
+<li>
+
+<p>Configure settings necessary for enabling SSL/TLS support in 
<tt>oozie-site.xml</tt>.</p>
+<p>2a. Set <tt>oozie.https.enabled</tt> to <tt>true</tt>. To revert back to 
HTTP, set <tt>oozie.https.enabled</tt> to <tt>false</tt>.</p>
+<p>2b. Set location and password for the keystore and location for truststore 
by setting <tt>oozie.https.keystore.file</tt>, 
<tt>oozie.https.keystore.pass</tt>, <tt>oozie.https.truststore.file</tt>.</p>
+<p><b>Note:</b> <tt>oozie.https.truststore.file</tt> can be overridden by 
setting <tt>javax.net.ssl.trustStore</tt> system property, 
<tt>oozie.https.keystore.pass</tt> by setting 
<tt>javax.net.ssl.trustStorePassword</tt>.</p>
+<p>The default HTTPS port Oozie listens on for secure connections is 11443; it 
can be changed via <tt>oozie.https.port</tt>.</p>
+<p>It is possible to specify other HTTPS settings via <tt>oozie-site.xml</tt>: 
- To include / exclude cipher suites, set 
<tt>oozie.https.include.cipher.suites</tt> / 
<tt>oozie.https.exclude.cipher.suites</tt>. - To include / exclude TLS 
protocols, set <tt>oozie.https.include.protocols</tt> / 
<tt>oozie.https.exclude.protocols</tt>. <b>Note:</b> Exclude is always 
preferred over include (i.e. if you both include and exclude an entity, it will 
be excluded).</p>
+<p><b>Note:</b> When SSL is enabled, HTTP Strict-Transport_security (HSTS) is 
also enabled. The default value for max-age is 31536000 (one year). This can be 
changed by setting <tt>oozie.hsts.max.age.seconds</tt> property. Setting it to 
<tt>0</tt> or <tt>negative value</tt>, will disable HSTS.</p>
+</li>
+<li>
+
+<p>Start the Oozie server</p>
+<p><b>Note:</b> If using Oozie HA, make sure that each Oozie server has a copy 
of the .keystore file.</p>
+</li>
+</ol></div>
+<div class="section">
+<h4><a 
name="Configure_the_Oozie_Client_to_connect_using_SSL_HTTPS"></a>Configure the 
Oozie Client to connect using SSL (HTTPS)</h4>
+<p>The first two steps are only necessary if you are using a Self-Signed 
Certificate; the third is required either way. Also, these steps must be done 
on every machine where you intend to use the Oozie Client.</p>
+<ol style="list-style-type: decimal">
+
+<li>
+
+<p>Copy or download the oozie.truststore file onto the client machine</p>
+</li>
+<li>
+
+<p>When using any Java-based program, you&#x2019;ll need to pass 
<tt>-Djavax.net.ssl.trustStore</tt> to the JVM.  To do this for the Oozie 
client:</p>
+
+<div>
+<div>
+<pre class="source">export 
OOZIE_CLIENT_OPTS='-Djavax.net.ssl.trustStore=/path/to/oozie.truststore'
+</pre></div></div>
+</li>
+<li>
+
+<p>When using the Oozie Client, you will need to use 
<tt>https://oozie.server.hostname:11443/oozie</tt> instead of 
<tt>http://oozie.server.hostname:11000/oozie</tt> &#x2013; Java will not 
automatically redirect from the http address to the https address.</p>
+</li>
+</ol></div>
+<div class="section">
+<h4><a name="Connect_to_the_Oozie_Web_UI_using_SSL_HTTPS"></a>Connect to the 
Oozie Web UI using SSL (HTTPS)</h4>
+<ol style="list-style-type: decimal">
+
+<li>Use <tt>https://oozie.server.hostname:11443/oozie</tt> though most 
browsers should automatically redirect you if you use 
<tt>http://oozie.server.hostname:11000/oozie</tt>
+<p><b>IMPORTANT</b>: If using a Self-Signed Certificate, your browser will 
warn you that it can&#x2019;t verify the certificate or something similar. You 
will probably have to add your certificate as an exception.</p></li>
+</ol></div>
+<div class="section">
+<h4><a name="Additional_considerations_for_Oozie_HA_with_SSL"></a>Additional 
considerations for Oozie HA with SSL</h4>
+<p>You&#x2019;ll need to configure the load balancer to do SSL pass-through.  
This will allow the clients talking to Oozie to use the SSL certificate 
provided by the Oozie servers (so the load balancer does not need one).  Please 
consult your load balancer&#x2019;s documentation on how to configure this.  
Make sure to point the load balancer at the <tt>https://HOST:HTTPS_PORT</tt> 
addresses for your Oozie servers.  Clients can then connect to the load 
balancer at <tt>https://LOAD_BALANCER_HOST:PORT</tt>.</p>
+<p><b>Important:</b> Callbacks from the ApplicationMaster are done via http or 
https depending on what you enter for the <tt>OOZIE_BASE_URL</tt> property.  If 
you are using a Certificate from a Certificate Authority, you can simply put 
the https address here. If you are using a self-signed certificate, you have to 
do one of the following options (Option 1 is recommended):</p>
+<p>Option 1) You&#x2019;ll need to follow the steps in the <a 
href="AG_Install.html#Configure_the_Oozie_Client_to_connect_using_SSL_HTTPS">Configure
 the Oozie Client to connect using SSL (HTTPS)</a> section, but on the host of 
the ApplicationMaster.  You can then set <tt>OOZIE_BASE_URL</tt> to the load 
balancer https address. This will allow the ApplicationMaster to contact the 
Oozie server with https (like the Oozie client, they are also Java 
programs).</p>
+<p>Option 2) You&#x2019;ll need setup another load balancer, or another 
&#x201c;pool&#x201d; on the existing load balancer, with the http addresses of 
the Oozie servers.  You can then set <tt>OOZIE_BASE_URL</tt> to the load 
balancer http address.  Clients should use the https load balancer address.  
This will allow clients to use https while the ApplicationMaster uses http for 
callbacks.</p></div></div>
+<div class="section">
+<h3><a name="Fine_Tuning_an_Oozie_Server"></a>Fine Tuning an Oozie Server</h3>
+<p>Refer to the <a href="./oozie-default.xml">oozie-default.xml</a> for 
details.</p></div>
+<div class="section">
+<h3><a name="Using_Instrumentation_instead_of_Metrics"></a>Using 
Instrumentation instead of Metrics</h3>
+<p>As of version 4.1.0, Oozie includes a replacement for the Instrumentation 
based on Codahale&#x2019;s Metrics library.  It includes a number of 
improvements over the original Instrumentation included in Oozie.  They both 
report most of the same information, though the formatting is slightly 
different and there&#x2019;s some additional information in the Metrics 
version; the format of the output to the oozie-instrumentation log is also 
different.</p>
+<p>As of version 5.0.0, <tt>MetricsInstrumentationService</tt> is the default 
one, it&#x2019;s enlisted in <tt>oozie.services</tt>:</p>
+
+<div>
+<div>
+<pre class="source">    &lt;property&gt;
+        &lt;name&gt;oozie.services&lt;/name&gt;
+        &lt;value&gt;
+            ...
+            org.apache.oozie.service.MetricsInstrumentationService,
+            ...
+        &lt;/value&gt;
+     &lt;/property&gt;
+</pre></div></div>
+
+<p>The deprecated <tt>InstrumentationService</tt> can be enabled by adding 
<tt>InstrumentationService</tt> reference to the list of 
<tt>oozie.services.ext</tt>:</p>
+
+<div>
+<div>
+<pre class="source">    &lt;property&gt;
+        &lt;name&gt;oozie.services.ext&lt;/name&gt;
+        &lt;value&gt;
+            ...
+            org.apache.oozie.service.InstrumentationService,
+            ...
+        &lt;/value&gt;
+     &lt;/property&gt;
+</pre></div></div>
+
+<p>By default the <tt>admin/instrumentation</tt> REST endpoint is no longer be 
available and instead the <tt>admin/metrics</tt> endpoint can be used (see the 
<a href="WebServicesAPI.html#Oozie_Metrics">Web Services API</a> documentation 
for more details); the Oozie Web UI also replaces the 
&#x201c;Instrumentation&#x201d; tab with a &#x201c;Metrics&#x201d; tab.</p>
+<p>If the deprecated <tt>InstrumentationService</tt> is used, the 
<tt>admin/instrumentation</tt> REST endpoint gets enabled, the 
<tt>admin/metrics</tt> REST endpoint is no longer available (see the <a 
href="WebServicesAPI.html#Oozie_Metrics">Web Services API</a> documentation for 
more details); the Oozie Web UI also replaces the &#x201c;Metrics&#x201d; tab 
with the &#x201c;Instrumentation&#x201d; tab.</p>
+<p>We can also publish the instrumentation metrics to the external server 
graphite or ganglia. For this the following properties should be specified in 
oozie-site.xml :</p>
+
+<div>
+<div>
+<pre class="source">    &lt;property&gt;
+        &lt;name&gt;oozie.external_monitoring.enable&lt;/name&gt;
+        &lt;value&gt;false&lt;/value&gt;
+        &lt;description&gt;
+            If the oozie functional metrics needs to be exposed to the 
metrics-server backend, set it to true
+            If set to true, the following properties has to be specified : 
oozie.metrics.server.name,
+            oozie.metrics.host, oozie.metrics.prefix, 
oozie.metrics.report.interval.sec, oozie.metrics.port
+        &lt;/description&gt;
+    &lt;/property&gt;
+
+    &lt;property&gt;
+        &lt;name&gt;oozie.external_monitoring.type&lt;/name&gt;
+        &lt;value&gt;graphite&lt;/value&gt;
+        &lt;description&gt;
+            The name of the server to which we want to send the metrics, would 
be graphite or ganglia.
+        &lt;/description&gt;
+    &lt;/property&gt;
+
+    &lt;property&gt;
+        &lt;name&gt;oozie.external_monitoring.address&lt;/name&gt;
+        &lt;value&gt;http://localhost:2020&lt;/value&gt;
+    &lt;/property&gt;
+
+    &lt;property&gt;
+        &lt;name&gt;oozie.external_monitoring.metricPrefix&lt;/name&gt;
+        &lt;value&gt;oozie&lt;/value&gt;
+    &lt;/property&gt;
+
+    &lt;property&gt;
+        &lt;name&gt;oozie.external_monitoring.reporterIntervalSecs&lt;/name&gt;
+        &lt;value&gt;60&lt;/value&gt;
+    &lt;/property&gt;
+</pre></div></div>
+
+<p>We can also publish the instrumentation metrics via JMX interface. For this 
the following property should be specified in oozie-site.xml :</p>
+
+<div>
+<div>
+<pre class="source">    &lt;property&gt;
+         &lt;name&gt;oozie.jmx_monitoring.enable&lt;/name&gt;
+         &lt;value&gt;false&lt;/value&gt;
+         &lt;description&gt;
+             If the oozie functional metrics needs to be exposed via JMX 
interface, set it to true.
+         &lt;/description&gt;
+     &lt;/property&gt;&gt;
+</pre></div></div>
+
+<p><a name="HA"></a></p></div>
+<div class="section">
+<h3><a name="High_Availability_HA"></a>High Availability (HA)</h3>
+<p>Multiple Oozie Servers can be configured against the same database to 
provide High Availability (HA) of the Oozie service.</p>
+<div class="section">
+<h4><a name="Pre-requisites"></a>Pre-requisites</h4>
+<ol style="list-style-type: decimal">
+
+<li>
+
+<p>A database that supports multiple concurrent connections.  In order to have 
full HA, the database should also have HA support, or it becomes a single point 
of failure.</p>
+<p><b>NOTE:</b> The default derby database does not support this</p>
+</li>
+<li>
+
+<p>A ZooKeeper ensemble.</p>
+<p>Apache ZooKeeper is a distributed, open-source coordination service for 
distributed applications; the Oozie servers use it for coordinating access to 
the database and communicating with each other.  In order to have full HA, 
there should be at least 3 ZooKeeper servers. More information on ZooKeeper can 
be found <a class="externalLink" 
href="http://zookeeper.apache.org";>here</a>.</p>
+</li>
+<li>
+
+<p>Multiple Oozie servers.</p>
+<p><b>IMPORTANT:</b> While not strictly required for all configuration 
properties, all of the servers should ideally have exactly the same 
configuration for consistency&#x2019;s sake.</p>
+</li>
+<li>
+
+<p>A Loadbalancer, Virtual IP, or Round-Robin DNS.</p>
+<p>This is used to provide a single entry-point for users and for callbacks 
from the JobTracker/ResourceManager.  The load balancer should be configured 
for round-robin between the Oozie servers to distribute the requests.  Users 
(using either the Oozie client, a web browser, or the REST API) should connect 
through the load balancer.  In order to have full HA, the load balancer should 
also have HA support, or it becomes a single point of failure.</p>
+</li>
+</ol></div>
+<div class="section">
+<h4><a name="InstallationConfiguration_Steps"></a>Installation/Configuration 
Steps</h4>
+<ol style="list-style-type: decimal">
+
+<li>
+
+<p>Install identically configured Oozie servers normally.  Make sure they are 
all configured against the same database and make sure that you DO NOT start 
them yet.</p>
+</li>
+<li>
+
+<p>Add the following services to the extension services configuration property 
in oozie-site.xml in all Oozie servers.  This will make Oozie use the ZooKeeper 
versions of these services instead of the default implementations.</p>
+
+<div>
+<div>
+<pre class="source">&lt;property&gt;
+    &lt;name&gt;oozie.services.ext&lt;/name&gt;
+    &lt;value&gt;
+        org.apache.oozie.service.ZKLocksService,
+        org.apache.oozie.service.ZKXLogStreamingService,
+        org.apache.oozie.service.ZKJobsConcurrencyService,
+        org.apache.oozie.service.ZKUUIDService
+    &lt;/value&gt;
+&lt;/property&gt;
+</pre></div></div>
+</li>
+<li>
+
+<p>Add the following property to oozie-site.xml in all Oozie servers.  It 
should be a comma-separated list of host:port pairs of the ZooKeeper servers.  
The default value is shown below.</p>
+
+<div>
+<div>
+<pre class="source">&lt;property&gt;
+   &lt;name&gt;oozie.zookeeper.connection.string&lt;/name&gt;
+   &lt;value&gt;localhost:2181&lt;/value&gt;
+&lt;/property&gt;
+</pre></div></div>
+</li>
+<li>
+
+<p>(Optional) Add the following property to oozie-site.xml in all Oozie 
servers to specify the namespace to use.  All of the Oozie Servers that are 
planning on talking to each other should have the same namespace.  If there are 
multiple Oozie setups each doing their own HA, they should have their own 
namespace.  The default value is shown below.</p>
+
+<div>
+<div>
+<pre class="source">&lt;property&gt;
+    &lt;name&gt;oozie.zookeeper.namespace&lt;/name&gt;
+    &lt;value&gt;oozie&lt;/value&gt;
+&lt;/property&gt;
+</pre></div></div>
+</li>
+<li>
+
+<p>Change the value of <tt>OOZIE_BASE_URL</tt> in oozie-site.xml to point to 
the loadbalancer or virtual IP, for example:</p>
+
+<div>
+<div>
+<pre class="source">&lt;property&gt;
+    &lt;name&gt;oozie.base.url&lt;/name&gt;
+    &lt;value&gt;http://my.loadbalancer.hostname:11000/oozie&lt;/value&gt;
+&lt;/property&gt;
+</pre></div></div>
+</li>
+<li>
+
+<p>(Optional) If using a secure cluster, see <a 
href="AG_Install.html#Security">Security</a> below on configuring Kerberos with 
Oozie HA.</p>
+</li>
+<li>
+
+<p>Start the ZooKeeper servers.</p>
+</li>
+<li>
+
+<p>Start the Oozie servers.</p>
+<p>Note: If one of the Oozie servers becomes unavailable, querying Oozie for 
the logs from a job in the Web UI, REST API, or client may be missing 
information until that server comes back up.</p>
+</li>
+</ol></div>
+<div class="section">
+<h4><a name="Security"></a>Security</h4>
+<p>Oozie HA works with the existing Oozie security framework and settings. For 
HA features (log streaming, share lib, etc) to work properly in a secure setup, 
following property can be set on each server. If 
<tt>oozie.server.authentication.type</tt> is not set, then server-server 
authentication will fall back on <tt>oozie.authentication.type</tt>.</p>
+
+<div>
+<div>
+<pre class="source">&lt;property&gt;
+    &lt;name&gt;oozie.server.authentication.type&lt;/name&gt;
+    &lt;value&gt;kerberos&lt;/value&gt;
+&lt;/property&gt;
+</pre></div></div>
+
+<p>Below are some additional steps and information specific to Oozie HA:</p>
+<ol style="list-style-type: decimal">
+
+<li>
+
+<p>(Optional) To prevent unauthorized users or programs from interacting with 
or reading the znodes used by Oozie in ZooKeeper, you can tell Oozie to use 
Kerberos-backed ACLs.  To enforce this for all of the Oozie-related znodes, 
simply add the following property to oozie-site.xml in all Oozie servers and 
set it to <tt>true</tt>.  The default is <tt>false</tt>.</p>
+
+<div>
+<div>
+<pre class="source">&lt;property&gt;
+    &lt;name&gt;oozie.zookeeper.secure&lt;/name&gt;
+    &lt;value&gt;true&lt;/value&gt;
+&lt;/property&gt;
+</pre></div></div>
+
+<p>Note: The Kerberos principals of each of the Oozie servers should have the 
same primary name (i.e. in <tt>primary/instance@REALM</tt>, each server should 
have the same value for <tt>primary</tt>).</p>
+<p><b>Important:</b> Once this property is set to <tt>true</tt>, it will set 
the ACLs on all existing Oozie-related znodes to only allow Kerberos 
authenticated users with a principal that has the same primary as described 
above (also for any subsequently created new znodes). This means that if you 
ever want to turn this feature off, you will have to manually connect to 
ZooKeeper using a Kerberos principal with the same primary and either delete 
all znodes under and including the namespace (i.e. if 
<tt>oozie.zookeeper.namespace</tt> <tt></tt>oozie= then that would be 
<tt>/oozie</tt>); alternatively, instead of deleting them all, you can manually 
set all of their ACLs to <tt>world:anyone</tt>. In either case, make sure that 
no Oozie servers are running while this is being done.</p>
+<p>Also, in your zoo.cfg for ZooKeeper, make sure to set the following 
properties:</p>
+
+<div>
+<div>
+<pre 
class="source">authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
+kerberos.removeHostFromPrincipal=true
+kerberos.removeRealmFromPrincipal=true
+</pre></div></div>
+</li>
+<li>
+
+<p>Until Hadoop 2.5.0 and later, there is a known limitation where each Oozie 
server can only use one HTTP principal.  However, for Oozie HA, we need to use 
two HTTP principals: <tt>HTTP/oozie-server-host@realm</tt> and 
<tt>HTTP/load-balancer-host@realm</tt>.  This allows access to each Oozie 
server directly and through the load balancer.  While users should always go 
through the load balancer, certain features (e.g. log streaming) require the 
Oozie servers to talk to each other directly; it can also be helpful for an 
administrator to talk directly to an Oozie server.  So, if using a Hadoop 
version prior to 2.5.0, you will have to choose which HTTP principal to use as 
you cannot use both; it is recommended to choose 
<tt>HTTP/load-balancer-host@realm</tt> so users can connect through the load 
balancer.  This will prevent Oozie servers from talking to each other directly, 
which will effectively disable log streaming.</p>
+<p>For Hadoop 2.5.0 and later:</p>
+<p>2a. When creating the keytab used by Oozie, make sure to include 
Oozie&#x2019;s principal and the two HTTP principals mentioned above.</p>
+<p>2b. Set <tt>oozie.authentication.kerberos.principal</tt> to * (that is, an 
asterisks) so it will use both HTTP principals.</p>
+<p>For earlier versions of Hadoop:</p>
+<p>2a. When creating the keytab used by Oozie, make sure to include 
Oozie&#x2019;s principal and the load balancer HTTP principal</p>
+<p>2b. Set <tt>oozie.authentication.kerberos.principal</tt> to 
<tt>HTTP/load-balancer-host@realm</tt>.</p>
+</li>
+<li>
+
+<p>With Hadoop 2.6.0 and later, a rolling random secret that is synchronized 
across all Oozie servers will be used for signing the Oozie auth tokens.  This 
is done automatically when HA is enabled; no additional configuration is 
needed.</p>
+<p>For earlier versions of Hadoop, each server will have a different random 
secret.  This will still work but will likely result in additional calls to the 
KDC to authenticate users to the Oozie server (because the auth tokens will not 
be accepted by other servers, which will cause a fallback to Kerberos).</p>
+</li>
+<li>
+
+<p>If you&#x2019;d like to use HTTPS (SSL) with Oozie HA, there&#x2019;s some 
additional considerations that need to be made. See the <a 
href="AG_Install.html#Setting_Up_Oozie_with_HTTPS_SSL">Setting Up Oozie with 
HTTPS (SSL)</a> section for more information.</p>
+</li>
+</ol></div>
+<div class="section">
+<h4><a name="JobId_sequence"></a>JobId sequence</h4>
+<p>Oozie in HA mode, uses ZK to generate job id sequence. Job Ids are of 
following format. <tt>&lt;Id sequence&gt;-&lt;yyMMddHHmmss(server start 
time)&gt;-&lt;system_id&gt;-&lt;W/C/B&gt;</tt></p>
+<p>Where, <tt>&lt;systemId&gt;</tt> is configured as <tt>oozie.system.id</tt> 
(default is &#x201c;oozie-&#x201d; + &#x201c;user.name&#x201d;) W/C/B is suffix 
to job id indicating that generated job is a type of workflow or coordinator or 
bundle.</p>
+<p>Maximum allowed character for job id sequence is 40. &#x201c;Id 
sequence&#x201d; is stored in ZK and reset to 0 once maximum job id sequence is 
reached. Maximum job id sequence is configured as 
<tt>oozie.service.ZKUUIDService.jobid.sequence.max</tt>, default value is 
99999999990.</p>
+
+<div>
+<div>
+<pre class="source">&lt;property&gt;
+    &lt;name&gt;oozie.service.ZKUUIDService.jobid.sequence.max&lt;/name&gt;
+    &lt;value&gt;99999999990&lt;/value&gt;
+&lt;/property&gt;
+</pre></div></div>
+</div></div></div>
+<div class="section">
+<h2><a name="Starting_and_Stopping_Oozie"></a>Starting and Stopping Oozie</h2>
+<p>Use the standard commands to start and stop Oozie.</p></div>
+<div class="section">
+<h2><a name="Oozie_Command_Line_Installation"></a>Oozie Command Line 
Installation</h2>
+<p>Copy and expand the <tt>oozie-client</tt> TAR.GZ file bundled with the 
distribution. Add the <tt>bin/</tt> directory to the <tt>PATH</tt>.</p>
+<p>Refer to the <a href="DG_CommandLineTool.html">Command Line Interface 
Utilities</a> document for a full reference of the <tt>oozie</tt> command line 
tool.</p></div>
+<div class="section">
+<h2><a name="Oozie_Share_Lib"></a>Oozie Share Lib</h2>
+<p>The Oozie sharelib TAR.GZ file bundled with the distribution contains the 
necessary files to run Oozie map-reduce streaming, pig, hive, sqooop, and 
distcp actions.  There is also a sharelib for HCatalog.  The sharelib is 
required for these actions to work; any other actions (mapreduce, shell, ssh, 
and java) do not require the sharelib to be installed.</p>
+<p>As of Oozie 4.0, the following property is included.  If true, Oozie will 
create and ship a &#x201c;launcher jar&#x201d; to hdfs that contains classes 
necessary for the launcher job.  If false, Oozie will not do this, and it is 
assumed that the necessary classes are in their respective sharelib jars or the 
&#x201c;oozie&#x201d; sharelib instead.  When false, the sharelib is required 
for ALL actions; when true, the sharelib is only required for actions that need 
additional jars (the original list from above).</p>
+
+<div>
+<div>
+<pre class="source">&lt;property&gt;
+    &lt;name&gt;oozie.action.ship.launcher.jar&lt;/name&gt;
+    &lt;value&gt;true&lt;/value&gt;
+&lt;/property&gt;
+</pre></div></div>
+
+<p>Using sharelib CLI, sharelib files are copied to new 
lib_<tt>&lt;timestamped&gt;</tt> directory. At start, server picks the sharelib 
from latest time-stamp directory. While starting, server also purges sharelib 
directory which are older than sharelib retention days (defined as 
oozie.service.ShareLibService.temp.sharelib.retention.days and 7 days is 
default).</p>
+<p>Sharelib mapping file can be also configured. Configured file is a key 
value mapping, where key will be the sharelib name for the action and value is 
a comma separated list of DFS or local filesystem directories or jar files. 
Local filesystem refers to the local filesystem of the node where the Oozie 
launcher is running. This can be configured in oozie-site.xml as :</p>
+
+<div>
+<div>
+<pre class="source">  &lt;!-- OOZIE --&gt;
+    &lt;property&gt;
+        &lt;name&gt;oozie.service.ShareLibService.mapping.file&lt;/name&gt;
+        &lt;value&gt;&lt;/value&gt;
+        &lt;description&gt;
+            Sharelib mapping files contains list of key=value,
+            where key will be the sharelib name for the action and value is a 
comma separated list of
+            DFS or local filesystem directories or jar files.
+            Example.
+            oozie.pig_10=hdfs:///share/lib/pig/pig-0.10.1/lib/
+            oozie.pig=hdfs:///share/lib/pig/pig-0.11.1/lib/
+            
oozie.distcp=hdfs:///share/lib/hadoop-2.2.0/share/hadoop/tools/lib/hadoop-distcp-2.2.0.jar
+            
oozie.spark=hdfs:///share/lib/spark/lib/,hdfs:///share/lib/spark/python/lib/pyspark.zip,hdfs:///share/lib/spark/python/lib/py4j-0-9-src.zip
+            oozie.hive=file:///usr/local/oozie/share/lib/hive/
+        &lt;/description&gt;
+    &lt;/property&gt;
+</pre></div></div>
+
+<p>Example mapping file with local filesystem resources:</p>
+
+<div>
+<div>
+<pre class="source">    &lt;property&gt;
+        &lt;name&gt;oozie.service.ShareLibService.mapping.file&lt;/name&gt;
+        &lt;value&gt;
+            oozie.distcp=file:///usr/local/oozie/share/lib/distcp
+            oozie.hcatalog=file:///usr/local/oozie/share/lib/hcatalog
+            oozie.hive=file:///usr/local/oozie/share/lib/hive
+            oozie.hive2=file:///usr/local/oozie/share/lib/hive2
+            
oozie.mapreduce-streaming=file:///usr/local/oozie/share/lib/mapreduce-streaming
+            oozie.oozie=file://usr/local/oozie/share/lib/oozie
+            oozie.pig=file:///usr/local/oozie/share/lib/pig
+            oozie.spark=file:///usr/local/oozie/share/lib/spark
+            oozie.sqoop=file:///usr/localoozie/share/lib/sqoop
+        &lt;/value&gt;
+    &lt;/property&gt;
+</pre></div></div>
+
+<p>If you are using local filesystem resources in the mapping file, make sure 
corresponding jars are already deployed to all the nodes where Oozie launcher 
jobs will be executed, and the files are readable by the launchers. To do this, 
you can extract Oozie sharelib TAR.GZ file in the directory of your choice on 
the nodes, and set permission of the files.</p>
+<p>Oozie sharelib TAR.GZ file bundled with the distribution does not contain 
pyspark and py4j zip files since they vary with Apache Spark version. 
Therefore, to run pySpark using Spark Action, user need to specify pyspark and 
py4j zip files. These files can be added either to workflow&#x2019;s lib/ 
directory, to the sharelib or in sharelib mapping file.</p></div>
+<div class="section">
+<h2><a name="Oozie_CoordinatorsBundles_Processing_Timezone"></a>Oozie 
Coordinators/Bundles Processing Timezone</h2>
+<p>By default Oozie runs coordinator and bundle jobs using <tt>UTC</tt> 
timezone for datetime values specified in the application XML and in the job 
parameter properties. This includes coordinator applications start and end 
times of jobs, coordinator datasets initial-instance, and bundle applications 
kickoff times. In addition, coordinator dataset instance URI templates will be 
resolved using datetime values of the Oozie processing timezone.</p>
+<p>It is possible to set the Oozie processing timezone to a timezone that is 
an offset of UTC, alternate timezones must expressed in using a GMT offset ( 
<tt>GMT+/-####</tt> ). For example: <tt>GMT+0530</tt> (India timezone).</p>
+<p>To change the default <tt>UTC</tt> timezone, use the 
<tt>oozie.processing.timezone</tt> property in the <tt>oozie-site.xml</tt>. For 
example:</p>
+
+<div>
+<div>
+<pre class="source">&lt;configuration&gt;
+    &lt;property&gt;
+        &lt;name&gt;oozie.processing.timezone&lt;/name&gt;
+        &lt;value&gt;GMT+0530&lt;/value&gt;
+    &lt;/property&gt;
+&lt;/configuration&gt;
+</pre></div></div>
+
+<p><b>IMPORTANT:</b> If using a processing timezone other than <tt>UTC</tt>, 
all datetime values in coordinator and bundle jobs must be expressed in the 
corresponding timezone, for example <tt>2012-08-08T12:42+0530</tt>.</p>
+<p><b>NOTE:</b> It is strongly encouraged to use <tt>UTC</tt>, the default 
Oozie processing timezone.</p>
+<p>For more details on using an alternate Oozie processing timezone, please 
refer to the <a href="CoordinatorFunctionalSpec.html#datetime">Coordinator 
Functional Specification, section &#x2018;4. Datetime&#x2019;</a></p>
+<p><a name="UberJar"></a></p></div>
+<div class="section">
+<h2><a name="MapReduce_Workflow_Uber_Jars"></a>MapReduce Workflow Uber 
Jars</h2>
+<p>For Map-Reduce jobs (not including streaming or pipes), additional jar 
files can also be included via an uber jar. An uber jar is a jar file that 
contains additional jar files within a &#x201c;lib&#x201d; folder (see <a 
href="WorkflowFunctionalSpec.html#AppDeployment">Workflow Functional 
Specification</a> for more information). Submitting a workflow with an uber jar 
requires at least Hadoop 2.2.0 or 1.2.0. As such, using uber jars in a workflow 
is disabled by default. To enable this feature, use the 
<tt>oozie.action.mapreduce.uber.jar.enable</tt> property in the 
<tt>oozie-site.xml</tt> (and make sure to use a supported version of 
Hadoop).</p>
+
+<div>
+<div>
+<pre class="source">&lt;configuration&gt;
+    &lt;property&gt;
+        &lt;name&gt;oozie.action.mapreduce.uber.jar.enable&lt;/name&gt;
+        &lt;value&gt;true&lt;/value&gt;
+    &lt;/property&gt;
+&lt;/configuration&gt;
+</pre></div></div>
+</div>
+<div class="section">
+<h2><a name="AdvancedCustom_Environment_Settings"></a>Advanced/Custom 
Environment Settings</h2>
+<p>Oozie can be configured to use Unix standard filesystem hierarchy for its 
different files (configuration, logs, data and temporary files).</p>
+<p>These settings must be done in the <tt>bin/oozie-env.sh</tt> script.</p>
+<p>This script is sourced before the configuration <tt>oozie-env.sh</tt> and 
supports additional environment variables (shown with their default values):</p>
+
+<div>
+<div>
+<pre class="source">export OOZIE_CONFIG=${OOZIE_HOME}/conf
+export OOZIE_DATA={OOZIE_HOME}/data
+export OOZIE_LOG={OOZIE_HOME}/logs
+export JETTY_OUT=${OOZIE_LOGS}/jetty.out
+export JETTY_PID=/tmp/oozie.pid
+</pre></div></div>
+
+<p>Sample values to make Oozie follow Unix standard filesystem hierarchy:</p>
+
+<div>
+<div>
+<pre class="source">export OOZIE_CONFIG=/etc/oozie
+export OOZIE_DATA=/var/lib/oozie
+export OOZIE_LOG=/var/log/oozie
+export JETTY_PID=/tmp/oozie.pid
+</pre></div></div>
+
+<p><a href="index.html">::Go back to Oozie Documentation Index::</a></p></div>
+<div class="section">
+<h2><a name="Oozie_behind_a_trusted_proxy"></a>Oozie behind a trusted 
proxy</h2>
+<p>Oozie can be configured to work behind a proxy server - eg Apache Knox - 
which handles the Kerberos authentication for the incoming requests. In this 
case, the command line client can be configured to use basic authentication - 
and a custom user name and password - to authenticate with Knox. It has the 
advantage, that the client doesn&#x2019;t need Kerberos to be set up.</p></div>
+                  </div>
+            </div>
+          </div>
+
+    <hr/>
+
+    <footer>
+            <div class="container-fluid">
+                      <div class="row-fluid">
+                                      <p >Copyright &copy;                    
2021

[... 12 lines stripped ...]

Reply via email to