1.4.0...

lmccay Tue, 23 Jul 2019 14:27:25 -0700

Added: knox/trunk/books/1.4.0/service_avatica.md
URL: 
http://svn.apache.org/viewvc/knox/trunk/books/1.4.0/service_avatica.md?rev=1863668&view=auto
==============================================================================
--- knox/trunk/books/1.4.0/service_avatica.md (added)
+++ knox/trunk/books/1.4.0/service_avatica.md Tue Jul 23 21:27:15 2019
@@ -0,0 +1,100 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+--->
+
+### Avatica ###
+
+Knox provides gateway functionality for access to all Apache Avatica-based 
servers.
+The gateway can be used to provide authentication and encryption for clients to
+servers like the Apache Phoenix Query Server.
+
+#### Gateway configuration ####
+
+The Gateway can be configured for Avatica by modifying the topology XML file
+and providing a new service XML file.
+
+In the topology XML file, add the following with the correct hostname:
+
+    <service>
+      <role>AVATICA</role>
+      <url>http://avatica:8765</url>
+    </service>
+
+Your installation likely already contains the following service files. Ensure
+that they are present in your installation. In 
`services/avatica/1.9.0/rewrite.xml`:
+
+    <rules>
+        <rule dir="IN" name="AVATICA/avatica/inbound/root" 
pattern="*://*:*/**/avatica/">
+            <rewrite template="{$serviceUrl[AVATICA]}/"/>
+        </rule>
+        <rule dir="IN" name="AVATICA/avatica/inbound/path" 
pattern="*://*:*/**/avatica/{**}">
+            <rewrite template="{$serviceUrl[AVATICA]}/{**}"/>
+        </rule>
+    </rules>
+
+And in `services/avatica/1.9.0/service.xml`:
+
+    <service role="AVATICA" name="avatica" version="1.9.0">
+        <policies>
+            <policy role="webappsec"/>
+            <policy role="authentication"/>
+            <policy role="rewrite"/>
+            <policy role="authorization"/>
+        </policies>
+        <routes>
+            <route path="/avatica">
+                <rewrite apply="AVATICA/avatica/inbound/root" 
to="request.url"/>
+            </route>
+            <route path="/avatica/**">
+                <rewrite apply="AVATICA/avatica/inbound/path" 
to="request.url"/>
+            </route>
+        </routes>
+    </service>
+
+#### JDBC Drivers ####
+
+In most cases, users only need to modify the hostname of the Avatica server to
+instead be the Knox Gateway. To enable authentication, some of the Avatica
+property need to be added to the Properties object used when constructing the
+`Connection` or to the JDBC URL directly.
+
+The JDBC URL can be modified like:
+
+    
jdbc:avatica:remote:url=https://knox_gateway.domain:8443/gateway/sandbox/avatica;avatica_user=username;avatica_password=password;authentication=BASIC
+
+Or, using the `Properties` class:
+
+    Properties props = new Properties();
+    props.setProperty("avatica_user", "username");
+    props.setProperty("avatica_password", "password");
+    props.setProperty("authentication", "BASIC");
+    DriverManager.getConnection(url, props);
+
+Additionally, when the TLS certificate of the Knox Gateway is not trusted by 
your JVM installation,
+it will be necessary for you to pass in a custom truststore and truststore 
password to perform the
+necessary TLS handshake. This can be realized with the `truststore` and 
`truststore_password` properties
+using the same approaches as above.
+
+Via the JDBC URL:
+
+    
jdbc:avatica:remote:url=https://...;authentication=BASIC;truststore=/tmp/knox_truststore.jks;truststore_password=very_secret
+
+Using Java code:
+
+    ...
+    props.setProperty("truststore", "/tmp/knox_truststore.jks");
+    props.setProperty("truststore_password", "very_secret");
+    DriverManager.getConnection(url, props);


Added: knox/trunk/books/1.4.0/service_config.md
URL: 
http://svn.apache.org/viewvc/knox/trunk/books/1.4.0/service_config.md?rev=1863668&view=auto
==============================================================================
--- knox/trunk/books/1.4.0/service_config.md (added)
+++ knox/trunk/books/1.4.0/service_config.md Tue Jul 23 21:27:15 2019
@@ -0,0 +1,41 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+--->
+
+### Common Service Config ###
+
+It is possible to override a few of the global configuration settings provided 
in gateway-site.xml at the service level.
+These overrides are specified as name/value pairs within the \<service> 
elements of a particular service.
+The overridden settings apply only to that service.
+
+The following table shows the common configuration settings available at the 
service level via service level parameters.
+Individual services may support additional service level parameters.
+
+Property | Description | Default
+---------|-------------|---------
+httpclient.maxConnections    | The maximum number of connections that a single 
httpclient will maintain to a single host:port. | 32
+httpclient.connectionTimeout | The amount of time to wait when attempting a 
connection. The natural unit is milliseconds, but a 's' or 'm' suffix may be 
used for seconds or minutes respectively. The default timeout is system 
dependent. | 20s
+httpclient.socketTimeout     | The amount of time to wait for data on a socket 
before aborting the connection. The natural unit is milliseconds, but a 's' or 
'm' suffix may be used for seconds or minutes respectively. The default timeout 
is system dependent but is likely to be indefinite. | 20s
+
+The example below demonstrates how these service level parameters are used.
+
+    <service>
+         <role>HIVE</role>
+         <param>
+             <name>httpclient.socketTimeout</name>
+             <value>180s</value>
+         </param>
+    </service>

Added: knox/trunk/books/1.4.0/service_default_ha.md
URL: 
http://svn.apache.org/viewvc/knox/trunk/books/1.4.0/service_default_ha.md?rev=1863668&view=auto
==============================================================================
--- knox/trunk/books/1.4.0/service_default_ha.md (added)
+++ knox/trunk/books/1.4.0/service_default_ha.md Tue Jul 23 21:27:15 2019
@@ -0,0 +1,101 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+--->
+
+### Default Service HA support ###
+
+Knox provides connectivity based failover functionality for service calls that 
can be made to more than one server
+instance in a cluster. To enable this functionality HaProvider configuration 
needs to be enabled for the service and
+the service itself needs to be configured with more than one URL in the 
topology file.
+
+The default HA functionality works on a simple round robin algorithm, where 
the top of the list of URLs is always used
+to route all of a service's REST calls until a connection error occurs. The 
top URL is then put at the bottom of the
+list and the next URL is attempted. This goes on until the setting of 
'maxFailoverAttempts' is reached.
+
+At present the following services can use this default High Availability 
functionality and have been tested for the
+same:
+
+* WEBHCAT
+* HBASE
+* OOZIE
+
+To enable HA functionality for a service in Knox the following configuration 
has to be added to the topology file.
+
+    <provider>
+         <role>ha</role>
+         <name>HaProvider</name>
+         <enabled>true</enabled>
+         <param>
+             <name>{SERVICE}</name>
+             
<value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true</value>
+         </param>
+    </provider>
+
+The role and name of the provider above must be as shown. The name in the 
'param' section i.e. `{SERVICE}` must match
+that of the service role name that is being configured for HA and the value in 
the 'param' section is the configuration
+for that particular service in HA mode. For example, the value of `{SERVICE}` 
can be 'WEBHCAT', 'HBASE' or 'OOZIE'.
+
+To configure multiple services in HA mode, additional 'param' sections can be 
added.
+
+For example,
+
+    <provider>
+         <role>ha</role>
+         <name>HaProvider</name>
+         <enabled>true</enabled>
+         <param>
+             <name>OOZIE</name>
+             
<value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true</value>
+         </param>
+         <param>
+             <name>HBASE</name>
+             
<value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true</value>
+         </param>
+         <param>
+             <name>WEBHCAT</name>
+             
<value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true</value>
+         </param>
+    </provider>
+
+The various configuration parameters are described below:
+
+* maxFailoverAttempts -
+This is the maximum number of times a failover will be attempted. The failover 
strategy at this time is very simplistic
+in that the next URL in the list of URLs provided for the service is used and 
the one that failed is put at the bottom
+of the list. If the list is exhausted and the maximum number of attempts is 
not reached then the first URL will be tried
+again.
+
+* failoverSleep -
+The amount of time in millis that the process will wait or sleep before 
attempting to failover.
+
+* enabled -
+Flag to turn the particular service on or off for HA.
+
+And for the service configuration itself the additional URLs should be added 
to the list.
+
+    <service>
+        <role>{SERVICE}</role>
+        <url>http://host1:port1</url>
+        <url>http://host2:port2</url>
+    </service>
+
+For example,
+
+    <service>
+        <role>OOZIE</role>
+        <url>http://sandbox1:11000/oozie</url>
+        <url>http://sandbox2:11000/oozie</url>
+    </service>

Added: knox/trunk/books/1.4.0/service_elasticsearch.md
URL: 
http://svn.apache.org/viewvc/knox/trunk/books/1.4.0/service_elasticsearch.md?rev=1863668&view=auto
==============================================================================
--- knox/trunk/books/1.4.0/service_elasticsearch.md (added)
+++ knox/trunk/books/1.4.0/service_elasticsearch.md Tue Jul 23 21:27:15 2019
@@ -0,0 +1,159 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+--->
+
+### Elasticsearch ###
+
+Elasticsearch provides a REST API for communicating with Elasticsearch via 
JSON over HTTP. Elasticsearch uses X-Pack to do its own security 
(authentication and authorization). Therefore, the Knox Gateway is to forward 
the user credentials to Elasticsearch, and treats the 
Elasticsearch-authenticated user as "anonymous" to the backend service via a 
doas query param while Knox will authenticate to backend services as itself.
+
+#### Gateway configuration ####
+
+The Gateway can be configured for Elasticsearch by modifying the topology XML 
file and providing a new service XML file.
+
+In the topology XML file, add the following new service named "ELASTICSEARCH" 
with the correct elasticsearch-rest-server hostname and port number (e.g., 
9200):
+
+     <service>
+       <role>ELASTICSEARCH</role>
+       <url>http://<elasticsearch-rest-server>:9200/</url>
+       <name>elasticsearch</name>
+     </service>
+
+#### Elasticsearch via Knox Gateway ####
+
+After adding the above to a topology, you can make a cURL request similar to 
the following structures:
+
+##### 1.  Elasticsearch Node Root Query #####
+
+    curl -i -k -u username:password -H "Accept: application/json"  -X GET  
"https://{gateway-hostname}:{gateway-port}/gateway/{topology-name}/elasticsearch";
+
+    or
+
+    curl -i -k -u username:password -H "Accept: application/json"  -X GET  
"https://{gateway-hostname}:{gateway-port}/gateway/{topology-name}/elasticsearch/";
+
+The quotation marks around the URL, can be single quotes or double quotes on 
both sides, and can also be omitted (Note: This is true for all other 
Elasticsearch queries via Knox). Below is an example response:
+
+     HTTP/1.1 200 OK
+     Date: Wed, 23 May 2018 16:36:34 GMT
+     Content-Type: application/json; charset=UTF-8
+     Content-Length: 356
+     Server: Jetty(9.2.15.v20160210)
+     
+     
{"name":"w0A80p0","cluster_name":"elasticsearch","cluster_uuid":"poU7j48pSpu5qQONr64HLQ","version":{"number":"6.2.4","build_hash":"ccec39f","build_date":"2018-04-12T20:37:28.497551Z","build_snapshot":false,"lucene_version":"7.2.1","minimum_wire_compatibility_version":"5.6.0","minimum_index_compatibility_version":"5.0.0"},"tagline":"You
 Know, for Search"}
+    
+##### 2.  Elasticsearch Index - Creation, Deletion, Refreshing and Data 
Operations - Writing, Updating and Retrieval #####
+
+###### (1) Index Creation ######
+
+    curl -i -k -u username:password -H "Content-Type: application/json"  -X 
PUT  
"https://{gateway-hostname}:{gateway-port}/gateway/{topology-name}/elasticsearch/{index-name}";
  -d '{
+    "settings" : {
+        "index" : {
+            "number_of_shards" : {index-shards-number},
+            "number_of_replicas" : {index-replicas-number}
+        }
+      }
+    }'
+
+Below is an example response:
+
+     HTTP/1.1 200 OK
+     Date: Wed, 23 May 2018 16:51:31 GMT
+     Content-Type: application/json; charset=UTF-8
+     Content-Length: 65
+     Server: Jetty(9.2.15.v20160210)
+     
+     {"acknowledged":true,"shards_acknowledged":true,"index":"estest"}
+
+###### (2) Index Data Writing ######
+
+For adding a "Hello Joe Smith" document:
+
+    curl -i -k -u username:password -H "Content-Type: application/json"  -X 
PUT  
"https://{gateway-hostname}:{gateway-port}/gateway/{topology-name}/elasticsearch/{index-name}/{document-type-name}/{document-id}";
  -d '{
+        "title":"Hello Joe Smith" 
+    }'
+
+Below is an example response:
+
+     HTTP/1.1 201 Created
+     Date: Wed, 23 May 2018 17:00:17 GMT
+     Location: /estest/greeting/1
+     Content-Type: application/json; charset=UTF-8
+     Content-Length: 158
+     Server: Jetty(9.2.15.v20160210)
+     
+     
{"_index":"estest","_type":"greeting","_id":"1","_version":1,"result":"created","_shards":{"total":1,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}
+
+###### (3) Index Refreshing ######
+
+    curl -i -k -u username:password  -X POST  
"https://{gateway-hostname}:{gateway-port}/gateway/{topology-name}/elasticsearch/{index-name}/_refresh";
 
+
+Below is an example response:
+
+     HTTP/1.1 200 OK
+     Date: Wed, 23 May 2018 17:02:32 GMT
+     Content-Type: application/json; charset=UTF-8
+     Content-Length: 49
+     Server: Jetty(9.2.15.v20160210)
+     
+     {"_shards":{"total":1,"successful":1,"failed":0}}
+
+###### (4) Index Data Upgrading ######
+
+For changing the Person Joe Smith to Tom Smith:
+
+    curl -i -k -u username:password -H "Content-Type: application/json"  -X 
PUT  
"https://{gateway-hostname}:{gateway-port}/gateway/{topology-name}/elasticsearch/{index-name}/{document-type-name}/{document-id}";
  -d '{ 
+    "title":"Hello Tom Smith" 
+    }'
+
+Below is an example response:
+
+     HTTP/1.1 200 OK
+     Date: Wed, 23 May 2018 17:09:59 GMT
+     Content-Type: application/json; charset=UTF-8
+     Content-Length: 158
+     Server: Jetty(9.2.15.v20160210)
+     
+     
{"_index":"estest","_type":"greeting","_id":"1","_version":2,"result":"updated","_shards":{"total":1,"successful":1,"failed":0},"_seq_no":1,"_primary_term":1}
+
+###### (5) Index Data Retrieval or Search ######
+
+For finding documents with "title":"Hello" in a specified document-type:
+
+    curl -i -k -u username:password -H "Accept: application/json" -X GET  
"https://{gateway-hostname}:{gateway-port}/gateway/{topology-name}/elasticsearch/{index-name}/{document-type-name}/
 _search?pretty=true;q=title:Hello"
+
+Below is an example response:
+
+     HTTP/1.1 200 OK
+     Date: Wed, 23 May 2018 17:13:08 GMT
+     Content-Type: application/json; charset=UTF-8
+     Content-Length: 244
+     Server: Jetty(9.2.15.v20160210)
+     
+     
{"took":0,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":1,"max_score":0.2876821,"hits":[{"_index":"estest","_type":"greeting","_id":"1","_score":0.2876821,"_source":{"title":"Hello
 Tom Smith"}}]}}
+
+###### (6) Index Deleting ######
+
+    curl -i -k -u username:password  -X DELETE  
"https://{gateway-hostname}:{gateway-port}/gateway/{topology-name}/elasticsearch/{index-name}";
+
+Below is an example response:
+
+     HTTP/1.1 200 OK
+     Date: Wed, 23 May 2018 17:20:19 GMT
+     Content-Type: application/json; charset=UTF-8
+     Content-Length: 21
+     Server: Jetty(9.2.15.v20160210)
+     
+     {"acknowledged":true}
+

Added: knox/trunk/books/1.4.0/service_hbase.md
URL: 
http://svn.apache.org/viewvc/knox/trunk/books/1.4.0/service_hbase.md?rev=1863668&view=auto
==============================================================================
--- knox/trunk/books/1.4.0/service_hbase.md (added)
+++ knox/trunk/books/1.4.0/service_hbase.md Tue Jul 23 21:27:15 2019
@@ -0,0 +1,720 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+--->
+
+### HBase ###
+
+HBase provides an optional REST API (previously called Stargate).
+See the HBase REST Setup section below for getting started with the HBase REST 
API and Knox with the Hortonworks Sandbox environment.
+
+The gateway by default includes a sample topology descriptor file 
`{GATEWAY_HOME}/deployments/sandbox.xml`.  The value in this sample is 
configured to work with an installed Sandbox VM.
+
+    <service>
+        <role>WEBHBASE</role>
+        <url>http://localhost:60080</url>
+        <param>
+            <name>replayBufferSize</name>
+            <value>8</value>
+        </param>
+    </service>
+
+By default the gateway is configured to use port 60080 for Hbase in the 
Sandbox.  Please see the steps to configure the port mapping below.
+
+A default replayBufferSize of 8KB is shown in the sample topology file above.  
This may need to be increased if your query size is larger.
+
+#### HBase URL Mapping ####
+
+| ------- | 
----------------------------------------------------------------------------- |
+| Gateway | 
`https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/hbase` |
+| Cluster | `http://{hbase-rest-host}:8080/`                                   
      |
+
+#### HBase Examples ####
+
+The examples below illustrate the set of basic operations with HBase instance 
using the REST API.
+Use following link to get more details about HBase REST API: 
http://hbase.apache.org/book.html#_rest.
+
+Note: Some HBase examples may not work due to enabled [Access 
Control](http://hbase.apache.org/book.html#_securing_access_to_your_data). User 
may not be granted access for performing operations in the samples. In order to 
check if Access Control is configured in the HBase instance verify 
`hbase-site.xml` for a presence of 
`org.apache.hadoop.hbase.security.access.AccessController` in 
`hbase.coprocessor.master.classes` and `hbase.coprocessor.region.classes` 
properties.  
+To grant the Read, Write, Create permissions to `guest` user execute the 
following command:
+
+    echo grant 'guest', 'RWC' | hbase shell
+
+If you are using a cluster secured with Kerberos you will need to have used 
`kinit` to authenticate to the KDC.
+
+#### HBase REST API Setup ####
+
+#### Launch REST API ####
+
+The command below launches the REST daemon on port 8080 (the default)
+
+    sudo {HBASE_BIN}/hbase-daemon.sh start rest
+
+Where `{HBASE_BIN}` is `/usr/hdp/current/hbase-master/bin/` in the case of a 
HDP install.
+
+To use a different port use the `-p` option:
+
+    sudo {HBASE_BIN/hbase-daemon.sh start rest -p 60080
+
+#### Configure Sandbox port mapping for VirtualBox ####
+
+1. Select the VM
+2. Select menu Machine>Settings...
+3. Select tab Network
+4. Select Adapter 1
+5. Press Port Forwarding button
+6. Press Plus button to insert new rule: Name=HBASE REST, Host Port=60080, 
Guest Port=60080
+7. Press OK to close the rule window
+8. Press OK to Network window save the changes
+
+#### HBase Restart ####
+
+If it becomes necessary to restart HBase you can log into the hosts running 
HBase and use these steps.
+
+    sudo {HBASE_BIN}/hbase-daemon.sh stop rest
+    sudo -u hbase {HBASE_BIN}/hbase-daemon.sh stop regionserver
+    sudo -u hbase {HBASE_BIN}/hbase-daemon.sh stop master
+    sudo -u hbase {HBASE_BIN}/hbase-daemon.sh stop zookeeper
+
+    sudo -u hbase {HBASE_BIN}/hbase-daemon.sh start regionserver
+    sudo -u hbase {HBASE_BIN}/hbase-daemon.sh start master
+    sudo -u hbase {HBASE_BIN}/hbase-daemon.sh start zookeeper
+    sudo {HBASE_BIN}/hbase-daemon.sh start rest -p 60080
+
+Where `{HBASE_BIN}` is `/usr/hdp/current/hbase-master/bin/` in the case of a 
HDP Sandbox install.
+ 
+#### HBase client DSL ####
+
+For more details about client DSL usage please look at the chapter about the 
client DSL in this guide.
+
+After launching the shell, execute the following command to be able to use the 
snippets below.
+`import org.apache.knox.gateway.shell.hbase.HBase;`
+ 
+#### systemVersion() - Query Software Version.
+
+* Request
+    * No request parameters.
+* Response
+    * BasicResponse
+* Example
+    * `HBase.session(session).systemVersion().now().string`
+
+#### clusterVersion() - Query Storage Cluster Version.
+
+* Request
+    * No request parameters.
+* Response
+    * BasicResponse
+* Example
+    * `HBase.session(session).clusterVersion().now().string`
+
+#### status() - Query Storage Cluster Status.
+
+* Request
+    * No request parameters.
+* Response
+    * BasicResponse
+* Example
+    * `HBase.session(session).status().now().string`
+
+#### table().list() - Query Table List.
+
+* Request
+    * No request parameters.
+* Response
+    * BasicResponse
+* Example
+  * `HBase.session(session).table().list().now().string`
+
+#### table(String tableName).schema() - Query Table Schema.
+
+* Request
+    * No request parameters.
+* Response
+    * BasicResponse
+* Example
+    * `HBase.session(session).table().schema().now().string`
+
+#### table(String tableName).create() - Create Table Schema.
+
+* Request
+    * attribute(String name, Object value) - the table's attribute.
+    * family(String name) - starts family definition. Has sub requests:
+    * attribute(String name, Object value) - the family's attribute.
+    * endFamilyDef() - finishes family definition.
+* Response
+    * EmptyResponse
+* Example
+
+
+    HBase.session(session).table(tableName).create()
+       .attribute("tb_attr1", "value1")
+       .attribute("tb_attr2", "value2")
+       .family("family1")
+           .attribute("fm_attr1", "value3")
+           .attribute("fm_attr2", "value4")
+       .endFamilyDef()
+       .family("family2")
+       .family("family3")
+       .endFamilyDef()
+       .attribute("tb_attr3", "value5")
+       .now()
+
+#### table(String tableName).update() - Update Table Schema.
+
+* Request
+    * family(String name) - starts family definition. Has sub requests:
+    * attribute(String name, Object value) - the family's attribute.
+    * endFamilyDef() - finishes family definition.
+* Response
+    * EmptyResponse
+* Example
+
+
+    HBase.session(session).table(tableName).update()
+         .family("family1")
+             .attribute("fm_attr1", "new_value3")
+         .endFamilyDef()
+         .family("family4")
+             .attribute("fm_attr3", "value6")
+         .endFamilyDef()
+         .now()```
+
+#### table(String tableName).regions() - Query Table Metadata.
+
+* Request
+    * No request parameters.
+* Response
+    * BasicResponse
+* Example
+    * `HBase.session(session).table(tableName).regions().now().string`
+
+#### table(String tableName).delete() - Delete Table.
+
+* Request
+    * No request parameters.
+* Response
+    * EmptyResponse
+* Example
+    * `HBase.session(session).table(tableName).delete().now()`
+
+#### table(String tableName).row(String rowId).store() - Cell Store.
+
+* Request
+    * column(String family, String qualifier, Object value, Long time) - the 
data to store; "qualifier" may be "null"; "time" is optional.
+* Response
+    * EmptyResponse
+* Example
+
+
+    HBase.session(session).table(tableName).row("row_id_1").store()
+         .column("family1", "col1", "col_value1")
+         .column("family1", "col2", "col_value2", 1234567890l)
+         .column("family2", null, "fam_value1")
+         .now()
+
+
+    HBase.session(session).table(tableName).row("row_id_2").store()
+         .column("family1", "row2_col1", "row2_col_value1")
+         .now()
+
+#### table(String tableName).row(String rowId).query() - Cell or Row Query.
+
+* rowId is optional. Querying with null or empty rowId will select all rows.
+* Request
+    * column(String family, String qualifier) - the column to select; 
"qualifier" is optional.
+    * startTime(Long) - the lower bound for filtration by time.
+    * endTime(Long) - the upper bound for filtration by time.
+    * times(Long startTime, Long endTime) - the lower and upper bounds for 
filtration by time.
+    * numVersions(Long) - the maximum number of versions to return.
+* Response
+    * BasicResponse
+* Example
+
+
+    HBase.session(session).table(tableName).row("row_id_1")
+         .query()
+         .now().string
+
+
+    HBase.session(session).table(tableName).row().query().now().string
+
+
+    HBase.session(session).table(tableName).row().query()
+         .column("family1", "row2_col1")
+         .column("family2")
+         .times(0, Long.MAX_VALUE)
+         .numVersions(1)
+         .now().string
+
+#### table(String tableName).row(String rowId).delete() - Row, Column, or Cell 
Delete.
+
+* Request
+    * column(String family, String qualifier) - the column to delete; 
"qualifier" is optional.
+    * time(Long) - the upper bound for time filtration.
+* Response
+    * EmptyResponse
+* Example
+
+
+    HBase.session(session).table(tableName).row("row_id_1")
+         .delete()
+         .column("family1", "col1")
+         .now()```
+
+
+    HBase.session(session).table(tableName).row("row_id_1")
+         .delete()
+         .column("family2")
+         .time(Long.MAX_VALUE)
+         .now()```
+
+#### table(String tableName).scanner().create() - Scanner Creation.
+
+* Request
+    * startRow(String) - the lower bound for filtration by row id.
+    * endRow(String) - the upper bound for filtration by row id.
+    * rows(String startRow, String endRow) - the lower and upper bounds for 
filtration by row id.
+    * column(String family, String qualifier) - the column to select; 
"qualifier" is optional.
+    * batch(Integer) - the batch size.
+    * startTime(Long) - the lower bound for filtration by time.
+    * endTime(Long) - the upper bound for filtration by time.
+    * times(Long startTime, Long endTime) - the lower and upper bounds for 
filtration by time.
+    * filter(String) - the filter XML definition.
+    * maxVersions(Integer) - the maximum number of versions to return.
+* Response
+    * scannerId : String - the scanner ID of the created scanner. Consumes 
body.
+* Example
+
+
+    HBase.session(session).table(tableName).scanner().create()
+         .column("family1", "col2")
+         .column("family2")
+         .startRow("row_id_1")
+         .endRow("row_id_2")
+         .batch(1)
+         .startTime(0)
+         .endTime(Long.MAX_VALUE)
+         .filter("")
+         .maxVersions(100)
+         .now()```
+
+#### table(String tableName).scanner(String scannerId).getNext() - Scanner Get 
Next.
+
+* Request
+    * No request parameters.
+* Response
+    * BasicResponse
+* Example
+    * 
`HBase.session(session).table(tableName).scanner(scannerId).getNext().now().string`
+
+#### table(String tableName).scanner(String scannerId).delete() - Scanner 
Deletion.
+
+* Request
+    * No request parameters.
+* Response
+    * EmptyResponse
+* Example
+    * 
`HBase.session(session).table(tableName).scanner(scannerId).delete().now()`
+
+### HBase via Client DSL ###
+
+This example illustrates sequence of all basic HBase operations: 
+1. get system version
+2. get cluster version
+3. get cluster status
+4. create the table
+5. get list of tables
+6. get table schema
+7. update table schema
+8. insert single row into table
+9. query row by id
+10. query all rows
+11. delete cell from row
+12. delete entire column family from row
+13. get table regions
+14. create scanner
+15. fetch values using scanner
+16. drop scanner
+17. drop the table
+
+There are several ways to do this depending upon your preference.
+
+You can use the Groovy interpreter provided with the distribution.
+
+    java -jar bin/shell.jar samples/ExampleHBase.groovy
+
+You can manually type in the KnoxShell DSL script into the interactive Groovy 
interpreter provided with the distribution.
+
+    java -jar bin/shell.jar
+
+Each line from the file below will need to be typed or copied into the 
interactive shell.
+
+    /**
+     * Licensed to the Apache Software Foundation (ASF) under one
+     * or more contributor license agreements.  See the NOTICE file
+     * distributed with this work for additional information
+     * regarding copyright ownership.  The ASF licenses this file
+     * to you under the Apache License, Version 2.0 (the
+     * "License"); you may not use this file except in compliance
+     * with the License.  You may obtain a copy of the License at
+     *
+     *     http://www.apache.org/licenses/LICENSE-2.0
+     *
+     * Unless required by applicable law or agreed to in writing, software
+     * distributed under the License is distributed on an "AS IS" BASIS,
+     * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+     * See the License for the specific language governing permissions and
+     * limitations under the License.
+     */
+    package org.apache.knox.gateway.shell.hbase
+
+    import org.apache.knox.gateway.shell.Hadoop
+
+    import static java.util.concurrent.TimeUnit.SECONDS
+
+    gateway = "https://localhost:8443/gateway/sandbox";
+    username = "guest"
+    password = "guest-password"
+    tableName = "test_table"
+
+    session = Hadoop.login(gateway, username, password)
+
+    println "System version : " + 
HBase.session(session).systemVersion().now().string
+
+    println "Cluster version : " + 
HBase.session(session).clusterVersion().now().string
+
+    println "Status : " + HBase.session(session).status().now().string
+
+    println "Creating table '" + tableName + "'..."
+
+    HBase.session(session).table(tableName).create()  \
+        .attribute("tb_attr1", "value1")  \
+        .attribute("tb_attr2", "value2")  \
+        .family("family1")  \
+            .attribute("fm_attr1", "value3")  \
+            .attribute("fm_attr2", "value4")  \
+        .endFamilyDef()  \
+        .family("family2")  \
+        .family("family3")  \
+        .endFamilyDef()  \
+        .attribute("tb_attr3", "value5")  \
+        .now()
+
+    println "Done"
+
+    println "Table List : " + 
HBase.session(session).table().list().now().string
+
+    println "Schema for table '" + tableName + "' : " + HBase.session(session) 
 \
+        .table(tableName)  \
+        .schema()  \
+        .now().string
+
+    println "Updating schema of table '" + tableName + "'..."
+
+    HBase.session(session).table(tableName).update()  \
+        .family("family1")  \
+            .attribute("fm_attr1", "new_value3")  \
+        .endFamilyDef()  \
+        .family("family4")  \
+            .attribute("fm_attr3", "value6")  \
+        .endFamilyDef()  \
+        .now()
+
+    println "Done"
+
+    println "Schema for table '" + tableName + "' : " + HBase.session(session) 
 \
+        .table(tableName)  \
+        .schema()  \
+        .now().string
+
+    println "Inserting data into table..."
+
+    HBase.session(session).table(tableName).row("row_id_1").store()  \
+        .column("family1", "col1", "col_value1")  \
+        .column("family1", "col2", "col_value2", 1234567890l)  \
+        .column("family2", null, "fam_value1")  \
+        .now()
+
+    HBase.session(session).table(tableName).row("row_id_2").store()  \
+        .column("family1", "row2_col1", "row2_col_value1")  \
+        .now()
+
+    println "Done"
+
+    println "Querying row by id..."
+
+    println HBase.session(session).table(tableName).row("row_id_1")  \
+        .query()  \
+        .now().string
+
+    println "Querying all rows..."
+
+    println HBase.session(session).table(tableName).row().query().now().string
+
+    println "Querying row by id with extended settings..."
+
+    println HBase.session(session).table(tableName).row().query()  \
+        .column("family1", "row2_col1")  \
+        .column("family2")  \
+        .times(0, Long.MAX_VALUE)  \
+        .numVersions(1)  \
+        .now().string
+
+    println "Deleting cell..."
+
+    HBase.session(session).table(tableName).row("row_id_1")  \
+        .delete()  \
+        .column("family1", "col1")  \
+        .now()
+
+    println "Rows after delete:"
+
+    println HBase.session(session).table(tableName).row().query().now().string
+
+    println "Extended cell delete"
+
+    HBase.session(session).table(tableName).row("row_id_1")  \
+        .delete()  \
+        .column("family2")  \
+        .time(Long.MAX_VALUE)  \
+        .now()
+
+    println "Rows after delete:"
+
+    println HBase.session(session).table(tableName).row().query().now().string
+
+    println "Table regions : " + HBase.session(session).table(tableName)  \
+        .regions()  \
+        .now().string
+
+    println "Creating scanner..."
+
+    scannerId = HBase.session(session).table(tableName).scanner().create()  \
+        .column("family1", "col2")  \
+        .column("family2")  \
+        .startRow("row_id_1")  \
+        .endRow("row_id_2")  \
+        .batch(1)  \
+        .startTime(0)  \
+        .endTime(Long.MAX_VALUE)  \
+        .filter("")  \
+        .maxVersions(100)  \
+        .now().scannerId
+
+    println "Scanner id=" + scannerId
+
+    println "Scanner get next..."
+
+    println HBase.session(session).table(tableName).scanner(scannerId)  \
+        .getNext()  \
+        .now().string
+
+    println "Dropping scanner with id=" + scannerId
+
+    HBase.session(session).table(tableName).scanner(scannerId).delete().now()
+
+    println "Done"
+
+    println "Dropping table '" + tableName + "'..."
+
+    HBase.session(session).table(tableName).delete().now()
+
+    println "Done"
+
+    session.shutdown(10, SECONDS)
+
+### HBase via cURL
+
+#### Get software version
+
+Set Accept Header to "text/plain", "text/xml", "application/json" or 
"application/x-protobuf"
+
+    %  curl -ik -u guest:guest-password\
+     -H "Accept:  application/json"\
+     -X GET 'https://localhost:8443/gateway/sandbox/hbase/version'
+
+#### Get version information regarding the HBase cluster backing the REST API 
instance
+
+Set Accept Header to "text/plain", "text/xml" or "application/x-protobuf"
+
+    %  curl -ik -u guest:guest-password\
+     -H "Accept: text/xml"\
+     -X GET 'https://localhost:8443/gateway/sandbox/hbase/version/cluster'
+
+#### Get detailed status on the HBase cluster backing the REST API instance.
+
+Set Accept Header to "text/plain", "text/xml", "application/json" or 
"application/x-protobuf"
+
+    curl -ik -u guest:guest-password\
+     -H "Accept: text/xml"\
+     -X GET 'https://localhost:8443/gateway/sandbox/hbase/status/cluster'
+
+#### Get the list of available tables.
+
+Set Accept Header to "text/plain", "text/xml", "application/json" or 
"application/x-protobuf"
+
+    curl -ik -u guest:guest-password\
+     -H "Accept: text/xml"\
+     -X GET 'https://localhost:8443/gateway/sandbox/hbase'
+
+#### Create table with two column families using xml input
+
+    curl -ik -u guest:guest-password\
+     -H "Accept: text/xml"   -H "Content-Type: text/xml"\
+     -d '<?xml version="1.0" encoding="UTF-8"?><TableSchema 
name="table1"><ColumnSchema name="family1"/><ColumnSchema 
name="family2"/></TableSchema>'\
+     -X PUT 'https://localhost:8443/gateway/sandbox/hbase/table1/schema'
+
+#### Create table with two column families using JSON input
+
+    curl -ik -u guest:guest-password\
+     -H "Accept: application/json"  -H "Content-Type: application/json"\
+     -d 
'{"name":"table2","ColumnSchema":[{"name":"family3"},{"name":"family4"}]}'\
+     -X PUT 'https://localhost:8443/gateway/sandbox/hbase/table2/schema'
+
+#### Get table metadata
+
+    curl -ik -u guest:guest-password\
+     -H "Accept: text/xml"\
+     -X GET 'https://localhost:8443/gateway/sandbox/hbase/table1/regions'
+
+#### Insert single row table
+
+    curl -ik -u guest:guest-password\
+     -H "Content-Type: text/xml"\
+     -H "Accept: text/xml"\
+     -d '<?xml version="1.0" encoding="UTF-8" standalone="yes"?><CellSet><Row 
key="cm93MQ=="><Cell column="ZmFtaWx5MTpjb2wx" 
>dGVzdA==</Cell></Row></CellSet>'\
+     -X POST 'https://localhost:8443/gateway/sandbox/hbase/table1/row1'
+
+#### Insert multiple rows into table
+
+    curl -ik -u guest:guest-password\
+     -H "Content-Type: text/xml"\
+     -H "Accept: text/xml"\
+     -d '<?xml version="1.0" encoding="UTF-8" standalone="yes"?><CellSet><Row 
key="cm93MA=="><Cell column=" ZmFtaWx5Mzpjb2x1bW4x" >dGVzdA==</Cell></Row><Row 
key="cm93MQ=="><Cell column=" ZmFtaWx5NDpjb2x1bW4x" 
>dGVzdA==</Cell></Row></CellSet>'\
+     -X POST 
'https://localhost:8443/gateway/sandbox/hbase/table2/false-row-key'
+
+#### Get all data from table
+
+Set Accept Header to "text/plain", "text/xml", "application/json" or 
"application/x-protobuf"
+
+    curl -ik -u guest:guest-password\
+     -H "Accept: text/xml"\
+     -X GET 'https://localhost:8443/gateway/sandbox/hbase/table1/*'
+
+#### Execute cell or row query
+
+Set Accept Header to "text/plain", "text/xml", "application/json" or 
"application/x-protobuf"
+
+    curl -ik -u guest:guest-password\
+     -H "Accept: text/xml"\
+     -X GET 
'https://localhost:8443/gateway/sandbox/hbase/table1/row1/family1:col1'
+
+#### Delete entire row from table
+
+    curl -ik -u guest:guest-password\
+     -H "Accept: text/xml"\
+     -X DELETE 'https://localhost:8443/gateway/sandbox/hbase/table2/row0'
+
+#### Delete column family from row
+
+    curl -ik -u guest:guest-password\
+     -H "Accept: text/xml"\
+     -X DELETE 
'https://localhost:8443/gateway/sandbox/hbase/table2/row0/family3'
+
+#### Delete specific column from row
+
+    curl -ik -u guest:guest-password\
+     -H "Accept: text/xml"\
+     -X DELETE 
'https://localhost:8443/gateway/sandbox/hbase/table2/row0/family3'
+
+#### Create scanner
+
+Scanner URL will be in Location response header
+
+    curl -ik -u guest:guest-password\
+     -H "Content-Type: text/xml"\
+     -d '<Scanner batch="1"/>'\
+     -X PUT 'https://localhost:8443/gateway/sandbox/hbase/table1/scanner'
+
+#### Get the values of the next cells found by the scanner
+
+    curl -ik -u guest:guest-password\
+     -H "Accept: application/json"\
+     -X GET 
'https://localhost:8443/gateway/sandbox/hbase/table1/scanner/13705290446328cff5ed'
+
+#### Delete scanner
+
+    curl -ik -u guest:guest-password\
+     -H "Accept: text/xml"\
+     -X DELETE 
'https://localhost:8443/gateway/sandbox/hbase/table1/scanner/13705290446328cff5ed'
+
+#### Delete table
+
+    curl -ik -u guest:guest-password\
+     -X DELETE 'https://localhost:8443/gateway/sandbox/hbase/table1/schema'
+
+
+### HBase REST HA ###
+
+Please look at #[Default Service HA support] if you wish to explicitly list 
the URLs under the service definition.
+
+If you run the HBase REST Server from the HBase Region Server nodes, you can 
utilize more advanced HA support.  The HBase 
+REST Server does not register itself with ZooKeeper.  So the Knox HA component 
looks in ZooKeeper for instances of HBase Region 
+Servers and then performs a light weight ping for the presence of the REST 
Server on the same hosts.  The user should not supply URLs 
+in the service definition.  
+
+Note: Users of Ambari must manually startup the HBase REST Server.
+
+To enable HA functionality for HBase in Knox the following configuration has 
to be added to the topology file.
+
+    <provider>
+        <role>ha</role>
+        <name>HaProvider</name>
+        <enabled>true</enabled>
+        <param>
+            <name>WEBHBASE</name>
+            
<value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true;zookeeperEnsemble=machine1:2181,machine2:2181,machine3:2181</value>
+       </param>
+    </provider>
+
+The role and name of the provider above must be as shown. The name in the 
'param' section must match that of the service
+role name that is being configured for HA and the value in the 'param' section 
is the configuration for that particular
+service in HA mode. In this case the name is 'WEBHBASE'.
+
+The various configuration parameters are described below:
+
+* maxFailoverAttempts -
+This is the maximum number of times a failover will be attempted. The failover 
strategy at this time is very simplistic
+in that the next URL in the list of URLs provided for the service is used and 
the one that failed is put at the bottom
+of the list. If the list is exhausted and the maximum number of attempts is 
not reached then the first URL will be tried
+again after the list is fetched again from Zookeeper (a refresh of the list is 
done at this point)
+
+* failoverSleep -
+The amount of time in millis that the process will wait or sleep before 
attempting to failover.
+
+* enabled -
+Flag to turn the particular service on or off for HA.
+
+* zookeeperEnsemble -
+A comma separated list of host names (or IP addresses) of the ZooKeeper hosts 
that consist of the ensemble that the HBase 
+servers register their information with. 
+
+And for the service configuration itself the URLs need NOT be added to the 
list. For example:
+
+    <service>
+        <role>WEBHBASE</role>
+    </service>
+
+Please note that there is no `<url>` tag specified here as the URLs for the 
Kafka servers are obtained from ZooKeeper.

Added: knox/trunk/books/1.4.0/service_hive.md
URL: 
http://svn.apache.org/viewvc/knox/trunk/books/1.4.0/service_hive.md?rev=1863668&view=auto
==============================================================================
--- knox/trunk/books/1.4.0/service_hive.md (added)
+++ knox/trunk/books/1.4.0/service_hive.md Tue Jul 23 21:27:15 2019
@@ -0,0 +1,329 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+### Hive ###
+
+The [Hive wiki pages](https://cwiki.apache.org/confluence/display/Hive/Home) 
describe Hive installation and configuration processes.
+In sandbox configuration file for Hive is located at `/etc/hive/hive-site.xml`.
+Hive Server has to be started in HTTP mode.
+Note the properties shown below as they are related to configuration required 
by the gateway.
+
+    <property>
+        <name>hive.server2.thrift.http.port</name>
+        <value>10001</value>
+        <description>Port number when in HTTP mode.</description>
+    </property>
+
+    <property>
+        <name>hive.server2.thrift.http.path</name>
+        <value>cliservice</value>
+        <description>Path component of URL endpoint when in HTTP 
mode.</description>
+    </property>
+
+    <property>
+        <name>hive.server2.transport.mode</name>
+        <value>http</value>
+        <description>Server transport mode. "binary" or "http".</description>
+    </property>
+
+    <property>
+        <name>hive.server2.allow.user.substitution</name>
+        <value>true</value>
+    </property>
+
+The gateway by default includes a sample topology descriptor file 
`{GATEWAY_HOME}/deployments/sandbox.xml`.
+The value in this sample is configured to work with an installed Sandbox VM.
+
+    <service>
+        <role>HIVE</role>
+        <url>http://localhost:10001/cliservice</url>
+        <param>
+            <name>replayBufferSize</name>
+            <value>8</value>
+        </param>
+    </service>
+
+By default the gateway is configured to use the binary transport mode for Hive 
in the Sandbox.
+
+A default replayBufferSize of 8KB is shown in the sample topology file above.  
This may need to be increased if your query size is larger.
+
+#### Hive JDBC URL Mapping ####
+
+| ------- | 
------------------------------------------------------------------------------- 
|
+| Gateway | 
`jdbc:hive2://{gateway-host}:{gateway-port}/;ssl=true;sslTrustStore={gateway-trust-store-path};trustStorePassword={gateway-trust-store-password};transportMode=http;httpPath={gateway-path}/{cluster-name}/hive`
 |
+| Cluster | `http://{hive-host}:{hive-port}/{hive-path}` |
+
+#### Hive Examples ####
+
+This guide provides detailed examples for how to do some basic interactions 
with Hive via the Apache Knox Gateway.
+
+##### Hive Setup #####
+
+1. Make sure you are running the correct version of Hive to ensure 
JDBC/Thrift/HTTP support.
+2. Make sure Hive Server is running on the correct port.
+3. Make sure Hive Server is running in HTTP mode.
+4. Client side (JDBC):
+     1. Hive JDBC in HTTP mode depends on following minimal libraries set to 
run successfully(must be in the classpath):
+         * hive-jdbc-0.14.0-standalone.jar;
+         * commons-logging-1.1.3.jar;
+     2. Connection URL has to be the following: 
`jdbc:hive2://{gateway-host}:{gateway-port}/;ssl=true;sslTrustStore={gateway-trust-store-path};trustStorePassword={gateway-trust-store-password};transportMode=http;httpPath={gateway-path}/{cluster-name}/hive`
+     3. Look at 
https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DDLOperations
 for examples.
+       Hint: For testing it would be better to execute `set 
hive.security.authorization.enabled=false` as the first statement.
+       Hint: Good examples of Hive DDL/DML can be found here 
http://gettingstarted.hadooponazure.com/hw/hive.html
+
+##### Customization #####
+
+This example may need to be tailored to the execution environment.
+In particular host name, host port, user name, user password and context path 
may need to be changed to match your environment.
+In particular there is one example file in the distribution that may need to 
be customized.
+Take a moment to review this file.
+All of the values that may need to be customized can be found together at the 
top of the file.
+
+* samples/hive/java/jdbc/sandbox/HiveJDBCSample.java
+
+##### Client JDBC Example #####
+
+Sample example for creating new table, loading data into it from the file 
system local to the Hive server and querying data from that table.
+
+###### Java ######
+
+    import java.sql.Connection;
+    import java.sql.DriverManager;
+    import java.sql.ResultSet;
+    import java.sql.SQLException;
+    import java.sql.Statement;
+
+    import java.util.logging.Level;
+    import java.util.logging.Logger;
+
+    public class HiveJDBCSample {
+
+      public static void main( String[] args ) {
+        Connection connection = null;
+        Statement statement = null;
+        ResultSet resultSet = null;
+
+        try {
+          String user = "guest";
+          String password = user + "-password";
+          String gatewayHost = "localhost";
+          int gatewayPort = 8443;
+          String trustStore = 
"/usr/lib/knox/data/security/keystores/gateway.jks";
+          String trustStorePassword = "knoxsecret";
+          String contextPath = "gateway/sandbox/hive";
+          String connectionString = String.format( 
"jdbc:hive2://%s:%d/;ssl=true;sslTrustStore=%s;trustStorePassword=%s?hive.server2.transport.mode=http;hive.server2.thrift.http.path=/%s",
 gatewayHost, gatewayPort, trustStore, trustStorePassword, contextPath );
+
+          // load Hive JDBC Driver
+          Class.forName( "org.apache.hive.jdbc.HiveDriver" );
+
+          // configure JDBC connection
+          connection = DriverManager.getConnection( connectionString, user, 
password );
+
+          statement = connection.createStatement();
+
+          // disable Hive authorization - it could be omitted if Hive 
authorization
+          // was configured properly
+          statement.execute( "set hive.security.authorization.enabled=false" );
+
+          // create sample table
+          statement.execute( "CREATE TABLE logs(column1 string, column2 
string, column3 string, column4 string, column5 string, column6 string, column7 
string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '" );
+
+          // load data into Hive from file /tmp/log.txt which is placed on the 
local file system
+          statement.execute( "LOAD DATA LOCAL INPATH '/tmp/log.txt' OVERWRITE 
INTO TABLE logs" );
+
+          resultSet = statement.executeQuery( "SELECT * FROM logs" );
+
+          while ( resultSet.next() ) {
+            System.out.println( resultSet.getString( 1 ) + " --- " + 
resultSet.getString( 2 ) + " --- " + resultSet.getString( 3 ) + " --- " + 
resultSet.getString( 4 ) );
+          }
+        } catch ( ClassNotFoundException ex ) {
+          Logger.getLogger( HiveJDBCSample.class.getName() ).log( 
Level.SEVERE, null, ex );
+        } catch ( SQLException ex ) {
+          Logger.getLogger( HiveJDBCSample.class.getName() ).log( 
Level.SEVERE, null, ex );
+        } finally {
+          if ( resultSet != null ) {
+            try {
+              resultSet.close();
+            } catch ( SQLException ex ) {
+              Logger.getLogger( HiveJDBCSample.class.getName() ).log( 
Level.SEVERE, null, ex );
+            }
+          }
+          if ( statement != null ) {
+            try {
+              statement.close();
+            } catch ( SQLException ex ) {
+              Logger.getLogger( HiveJDBCSample.class.getName() ).log( 
Level.SEVERE, null, ex );
+            }
+          }
+          if ( connection != null ) {
+            try {
+              connection.close();
+            } catch ( SQLException ex ) {
+              Logger.getLogger( HiveJDBCSample.class.getName() ).log( 
Level.SEVERE, null, ex );
+            }
+          }
+        }
+      }
+    }
+
+###### Groovy ######
+
+Make sure that `{GATEWAY_HOME/ext}` directory contains the following libraries 
for successful execution:
+
+- hive-jdbc-0.14.0-standalone.jar;
+- commons-logging-1.1.3.jar;
+
+There are several ways to execute this sample depending upon your preference.
+
+You can use the Groovy interpreter provided with the distribution.
+
+    java -jar bin/shell.jar 
samples/hive/groovy/jdbc/sandbox/HiveJDBCSample.groovy
+
+You can manually type in the KnoxShell DSL script into the interactive Groovy 
interpreter provided with the distribution.
+
+    java -jar bin/shell.jar
+
+Each line from the file below will need to be typed or copied into the 
interactive shell.
+
+    import java.sql.DriverManager
+
+    user = "guest";
+    password = user + "-password";
+    gatewayHost = "localhost";
+    gatewayPort = 8443;
+    trustStore = "/usr/lib/knox/data/security/keystores/gateway.jks";
+    trustStorePassword = "knoxsecret";
+    contextPath = "gateway/sandbox/hive";
+    connectionString = String.format( 
"jdbc:hive2://%s:%d/;ssl=true;sslTrustStore=%s;trustStorePassword=%s?hive.server2.transport.mode=http;hive.server2.thrift.http.path=/%s",
 gatewayHost, gatewayPort, trustStore, trustStorePassword, contextPath );
+
+    // Load Hive JDBC Driver
+    Class.forName( "org.apache.hive.jdbc.HiveDriver" );
+
+    // Configure JDBC connection
+    connection = DriverManager.getConnection( connectionString, user, password 
);
+
+    statement = connection.createStatement();
+
+    // Disable Hive authorization - This can be omitted if Hive authorization 
is configured properly
+    statement.execute( "set hive.security.authorization.enabled=false" );
+
+    // Create sample table
+    statement.execute( "CREATE TABLE logs(column1 string, column2 string, 
column3 string, column4 string, column5 string, column6 string, column7 string) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '" );
+
+    // Load data into Hive from file /tmp/log.txt which is placed on the local 
file system
+    statement.execute( "LOAD DATA LOCAL INPATH '/tmp/sample.log' OVERWRITE 
INTO TABLE logs" );
+
+    resultSet = statement.executeQuery( "SELECT * FROM logs" );
+
+    while ( resultSet.next() ) {
+      System.out.println( resultSet.getString( 1 ) + " --- " + 
resultSet.getString( 2 ) );
+    }
+
+    resultSet.close();
+    statement.close();
+    connection.close();
+
+Examples use 'log.txt' with content:
+
+    2012-02-03 18:35:34 SampleClass6 [INFO] everything normal for id 577725851
+    2012-02-03 18:35:34 SampleClass4 [FATAL] system problem at id 1991281254
+    2012-02-03 18:35:34 SampleClass3 [DEBUG] detail for id 1304807656
+    2012-02-03 18:35:34 SampleClass3 [WARN] missing id 423340895
+    2012-02-03 18:35:34 SampleClass5 [TRACE] verbose detail for id 2082654978
+    2012-02-03 18:35:34 SampleClass0 [ERROR] incorrect id  1886438513
+    2012-02-03 18:35:34 SampleClass9 [TRACE] verbose detail for id 438634209
+    2012-02-03 18:35:34 SampleClass8 [DEBUG] detail for id 2074121310
+    2012-02-03 18:35:34 SampleClass0 [TRACE] verbose detail for id 1505582508
+    2012-02-03 18:35:34 SampleClass0 [TRACE] verbose detail for id 1903854437
+    2012-02-03 18:35:34 SampleClass7 [DEBUG] detail for id 915853141
+    2012-02-03 18:35:34 SampleClass3 [TRACE] verbose detail for id 303132401
+    2012-02-03 18:35:34 SampleClass6 [TRACE] verbose detail for id 151914369
+    2012-02-03 18:35:34 SampleClass2 [DEBUG] detail for id 146527742
+    ...
+
+Expected output:
+
+    2012-02-03 --- 18:35:34 --- SampleClass6 --- [INFO]
+    2012-02-03 --- 18:35:34 --- SampleClass4 --- [FATAL]
+    2012-02-03 --- 18:35:34 --- SampleClass3 --- [DEBUG]
+    2012-02-03 --- 18:35:34 --- SampleClass3 --- [WARN]
+    2012-02-03 --- 18:35:34 --- SampleClass5 --- [TRACE]
+    2012-02-03 --- 18:35:34 --- SampleClass0 --- [ERROR]
+    2012-02-03 --- 18:35:34 --- SampleClass9 --- [TRACE]
+    2012-02-03 --- 18:35:34 --- SampleClass8 --- [DEBUG]
+    2012-02-03 --- 18:35:34 --- SampleClass0 --- [TRACE]
+    2012-02-03 --- 18:35:34 --- SampleClass0 --- [TRACE]
+    2012-02-03 --- 18:35:34 --- SampleClass7 --- [DEBUG]
+    2012-02-03 --- 18:35:34 --- SampleClass3 --- [TRACE]
+    2012-02-03 --- 18:35:34 --- SampleClass6 --- [TRACE]
+    2012-02-03 --- 18:35:34 --- SampleClass2 --- [DEBUG]
+    ...
+
+### HiveServer2 HA ###
+
+Knox provides basic failover functionality for calls made to Hive Server when 
more than one HiveServer2 instance is
+installed in the cluster and registered with the same ZooKeeper ensemble. The 
HA functionality in this case fetches the
+HiveServer2 URL information from a ZooKeeper ensemble, so the user need only 
supply the necessary ZooKeeper
+configuration and not the Hive connection URLs.
+
+To enable HA functionality for Hive in Knox the following configuration has to 
be added to the topology file.
+
+    <provider>
+        <role>ha</role>
+        <name>HaProvider</name>
+        <enabled>true</enabled>
+        <param>
+            <name>HIVE</name>
+            
<value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true;zookeeperEnsemble=machine1:2181,machine2:2181,machine3:2181;zookeeperNamespace=hiveserver2</value>
+       </param>
+    </provider>
+
+The role and name of the provider above must be as shown. The name in the 
'param' section must match that of the service
+role name that is being configured for HA and the value in the 'param' section 
is the configuration for that particular
+service in HA mode. In this case the name is 'HIVE'.
+
+The various configuration parameters are described below:
+
+* maxFailoverAttempts -
+This is the maximum number of times a failover will be attempted. The failover 
strategy at this time is very simplistic
+in that the next URL in the list of URLs provided for the service is used and 
the one that failed is put at the bottom
+of the list. If the list is exhausted and the maximum number of attempts is 
not reached then the first URL will be tried
+again after the list is fetched again from Zookeeper (a refresh of the list is 
done at this point)
+
+* failoverSleep -
+The amount of time in millis that the process will wait or sleep before 
attempting to failover.
+
+* enabled -
+Flag to turn the particular service on or off for HA.
+
+* zookeeperEnsemble -
+A comma separated list of host names (or IP addresses) of the zookeeper hosts 
that consist of the ensemble that the Hive
+servers register their information with. This value can be obtained from 
Hive's config file hive-site.xml as the value
+for the parameter 'hive.zookeeper.quorum'.
+
+* zookeeperNamespace -
+This is the namespace under which HiveServer2 information is registered in the 
Zookeeper ensemble. This value can be
+obtained from Hive's config file hive-site.xml as the value for the parameter 
'hive.server2.zookeeper.namespace'.
+
+
+And for the service configuration itself the URLs need not be added to the 
list. For example.
+
+    <service>
+        <role>HIVE</role>
+    </service>
+
+Please note that there is no `<url>` tag specified here as the URLs for the 
Hive servers are obtained from Zookeeper.

Added: knox/trunk/books/1.4.0/service_kafka.md
URL: 
http://svn.apache.org/viewvc/knox/trunk/books/1.4.0/service_kafka.md?rev=1863668&view=auto
==============================================================================
--- knox/trunk/books/1.4.0/service_kafka.md (added)
+++ knox/trunk/books/1.4.0/service_kafka.md Tue Jul 23 21:27:15 2019
@@ -0,0 +1,108 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+--->
+
+### Kafka ###
+
+Knox provides gateway functionality to Kafka when used with the Confluent 
Kafka REST Proxy. The Kafka REST APIs allow the user to view the status 
+of the cluster, perform administrative actions and produce messages.
+
+<p>Note: Consumption of messages via Knox at this time is not supported.</p>  
+
+The docs for the Confluent Kafka REST Proxy can be found here:
+http://docs.confluent.io/current/kafka-rest/docs/index.html
+
+To enable this functionality, a topology file needs to have the following 
configuration:
+
+    <service>
+        <role>KAFKA</role>
+        <url>http://<kafka-rest-host>:<kafka-rest-port></url>
+    </service>
+
+The default Kafka REST Proxy port is 8082. If it is configured to some other 
port, that configuration can be found in 
+`kafka-rest.properties` under the property `listeners`.
+
+#### Kafka URL Mapping ####
+
+For Kafka URLs, the mapping of Knox Gateway accessible URLs to direct Kafka 
URLs is the following.
+
+| ------- | 
-------------------------------------------------------------------------------------
 |
+| Gateway | 
`https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/kafka` |
+| Cluster | `http://{kakfa-rest-host}:{kafka-rest-port}}`                      
         |
+
+
+#### Kafka Examples via cURL
+
+Some of the various calls that can be made and examples using curl are listed 
below.
+
+    # 0. Getting topic info
+    
+    curl -ikv -u guest:guest-password -X GET 
'https://localhost:8443/gateway/sandbox/kafka/topics'
+
+    # 1. Publish message to topic
+    
+    curl -ikv -u guest:guest-password -X POST 
'https://localhost:8443/gateway/sandbox/kafka/topics/TOPIC1' -H 'Content-Type: 
application/vnd.kafka.json.v2+json' -H 'Accept: application/vnd.kafka.v2+json' 
--data '"records":[{"value":{"foo":"bar"}}]}'
+
+### Kafka HA ###
+
+Knox provides basic failover functionality for calls made to Kafka. Since the 
Confluent Kafka REST Proxy does not register
+itself with ZooKeeper, the HA component looks in ZooKeeper for instances of 
Kafka and then performs a light weight ping for
+the presence of the REST Proxy on the same hosts. As such the Kafka REST Proxy 
must be installed on the same host as Kafka.
+The user should not supply URLs in the service definition.  
+
+Note: Users of Ambari must manually startup the Confluent Kafka REST Proxy.
+
+To enable HA functionality for Kafka in Knox the following configuration has 
to be added to the topology file.
+
+    <provider>
+        <role>ha</role>
+        <name>HaProvider</name>
+        <enabled>true</enabled>
+        <param>
+            <name>KAFKA</name>
+            
<value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true;zookeeperEnsemble=machine1:2181,machine2:2181,machine3:2181</value>
+       </param>
+    </provider>
+
+The role and name of the provider above must be as shown. The name in the 
'param' section must match that of the service
+role name that is being configured for HA and the value in the 'param' section 
is the configuration for that particular
+service in HA mode. In this case the name is 'KAFKA'.
+
+The various configuration parameters are described below:
+
+* maxFailoverAttempts -
+This is the maximum number of times a failover will be attempted. The failover 
strategy at this time is very simplistic
+in that the next URL in the list of URLs provided for the service is used and 
the one that failed is put at the bottom
+of the list. If the list is exhausted and the maximum number of attempts is 
not reached then the first URL will be tried
+again after the list is fetched again from Zookeeper (a refresh of the list is 
done at this point)
+
+* failoverSleep -
+The amount of time in millis that the process will wait or sleep before 
attempting to failover.
+
+* enabled -
+Flag to turn the particular service on or off for HA.
+
+* zookeeperEnsemble -
+A comma separated list of host names (or IP addresses) of the ZooKeeper hosts 
that consist of the ensemble that the Kafka
+servers register their information with. 
+
+And for the service configuration itself the URLs need NOT be added to the 
list. For example:
+
+    <service>
+        <role>KAFKA</role>
+    </service>
+
+Please note that there is no `<url>` tag specified here as the URLs for the 
Kafka servers are obtained from ZooKeeper.

Added: knox/trunk/books/1.4.0/service_livy.md
URL: 
http://svn.apache.org/viewvc/knox/trunk/books/1.4.0/service_livy.md?rev=1863668&view=auto
==============================================================================
--- knox/trunk/books/1.4.0/service_livy.md (added)
+++ knox/trunk/books/1.4.0/service_livy.md Tue Jul 23 21:27:15 2019
@@ -0,0 +1,52 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+--->
+
+### Livy Server ###
+
+Knox provides proxied access to Livy server for submitting Spark jobs.
+The gateway can be used to provide authentication and encryption for clients to
+servers like Livy.
+
+#### Gateway configuration ####
+
+The Gateway can be configured for Livy by modifying the topology XML file
+and providing a new service XML file.
+
+In the topology XML file, add the following with the correct hostname:
+
+    <service>
+      <role>LIVYSERVER</role>
+      <url>http://<livy-server>:8998</url>
+    </service>
+
+Livy server will use proxyUser to run the Spark session. To avoid that a user 
can 
+provide here any user (e.g. a more privileged), Knox will need to rewrite the 
+JSON body to replace what so ever is the value of proxyUser is with the 
username of
+the authenticated user.
+
+    {  
+      "driverMemory":"2G",
+      "executorCores":4,
+      "executorMemory":"8G",
+      "proxyUser":"bernhard",
+      "conf":{  
+        "spark.master":"yarn-cluster",
+        "spark.jars.packages":"com.databricks:spark-csv_2.10:1.5.0"
+      }
+    } 
+
+The above is an example request body to be used to create a Spark session via 
Livy server and illustrates the "proxyUser" that requires rewrite.

Added: knox/trunk/books/1.4.0/service_oozie.md
URL: 
http://svn.apache.org/viewvc/knox/trunk/books/1.4.0/service_oozie.md?rev=1863668&view=auto
==============================================================================
--- knox/trunk/books/1.4.0/service_oozie.md (added)
+++ knox/trunk/books/1.4.0/service_oozie.md Tue Jul 23 21:27:15 2019
@@ -0,0 +1,200 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+--->
+
+### Oozie ###
+
+
+Oozie is a Hadoop component that provides complex job workflows to be 
submitted and managed.
+Please refer to the latest [Oozie 
documentation](http://oozie.apache.org/docs/4.2.0/) for details.
+
+In order to make Oozie accessible via the gateway there are several important 
Hadoop configuration settings.
+These all relate to the network endpoint exposed by various Hadoop services.
+
+The HTTP endpoint at which Oozie is running can be found via the 
`oozie.base.url property` in the `oozie-site.xml` file.
+In a Sandbox installation this can typically be found in 
`/etc/oozie/conf/oozie-site.xml`.
+
+    <property>
+        <name>oozie.base.url</name>
+        <value>http://sandbox.hortonworks.com:11000/oozie</value>
+    </property>
+
+The RPC address at which the Resource Manager exposes the JOBTRACKER endpoint 
can be found via the `yarn.resourcemanager.address` in the `yarn-site.xml` file.
+In a Sandbox installation this can typically be found in 
`/etc/hadoop/conf/yarn-site.xml`.
+
+    <property>
+        <name>yarn.resourcemanager.address</name>
+        <value>sandbox.hortonworks.com:8050</value>
+    </property>
+
+The RPC address at which the Name Node exposes its RPC endpoint can be found 
via the `dfs.namenode.rpc-address` in the `hdfs-site.xml` file.
+In a Sandbox installation this can typically be found in 
`/etc/hadoop/conf/hdfs-site.xml`.
+
+    <property>
+        <name>dfs.namenode.rpc-address</name>
+        <value>sandbox.hortonworks.com:8020</value>
+    </property>
+
+If HDFS has been configured to be in High Availability mode (HA), then instead 
of the RPC address mentioned above for the Name Node, look up and use the 
logical name of the service found via `dfs.nameservices` in `hdfs-site.xml`. 
For example,
+
+    <property>
+        <name>dfs.nameservices</name>
+        <value>ha-service</value>
+    </property>
+
+Please note, only one of the URLs, either the RPC endpoint or the HA service 
name should be used as the NAMENODE HDFS URL in the gateway topology file.
+
+The information above must be provided to the gateway via a topology 
descriptor file.
+These topology descriptor files are placed in `{GATEWAY_HOME}/deployments`.
+An example that is setup for the default configuration of the Sandbox is 
`{GATEWAY_HOME}/deployments/sandbox.xml`.
+These values will need to be changed for non-default Sandbox or other Hadoop 
cluster configuration.
+
+    <service>
+        <role>NAMENODE</role>
+        <url>hdfs://localhost:8020</url>
+    </service>
+    <service>
+        <role>JOBTRACKER</role>
+        <url>rpc://localhost:8050</url>
+    </service>
+    <service>
+        <role>OOZIE</role>
+        <url>http://localhost:11000/oozie</url>
+        <param>
+            <name>replayBufferSize</name>
+            <value>8</value>
+        </param>
+    </service>
+
+A default replayBufferSize of 8KB is shown in the sample topology file above.  
This may need to be increased if your request size is larger.
+
+#### Oozie URL Mapping ####
+
+For Oozie URLs, the mapping of Knox Gateway accessible URLs to direct Oozie 
URLs is simple.
+
+| ------- | 
--------------------------------------------------------------------------- |
+| Gateway | 
`https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/oozie` |
+| Cluster | `http://{oozie-host}:{oozie-port}/oozie}`                          
         |
+
+
+#### Oozie Request Changes ####
+
+TODO - In some cases the Oozie requests needs to be slightly different when 
made through the gateway.
+These changes are required in order to protect the client from knowing the 
internal structure of the Hadoop cluster.
+
+
+#### Oozie Example via Client DSL ####
+
+This example will also submit the familiar WordCount Java MapReduce job to the 
Hadoop cluster via the gateway using the KnoxShell DSL.
+However in this case the job will be submitted via a Oozie workflow.
+There are several ways to do this depending upon your preference.
+
+You can use the "embedded" Groovy interpreter provided with the distribution.
+
+    java -jar bin/shell.jar samples/ExampleOozieWorkflow.groovy
+
+You can manually type in the KnoxShell DSL script into the "embedded" Groovy 
interpreter provided with the distribution.
+
+    java -jar bin/shell.jar
+
+Each line from the file `samples/ExampleOozieWorkflow.groovy` will need to be 
typed or copied into the interactive shell.
+
+#### Oozie Example via cURL
+
+The example below illustrates the sequence of curl commands that could be used 
to run a "word count" map reduce job via an Oozie workflow.
+
+It utilizes the hadoop-examples.jar from a Hadoop install for running a simple 
word count job.
+A copy of that jar has been included in the samples directory for convenience.
+
+In addition a workflow definition and configuration file is required.
+These have not been included but are available for download.
+Download [workflow-definition.xml](workflow-definition.xml) and 
[workflow-configuration.xml](workflow-configuration.xml) and store them in the 
`{GATEWAY_HOME}` directory.
+Review the contents of workflow-configuration.xml to ensure that it matches 
your environment.
+
+Take care to follow the instructions below where replacement values are 
required.
+These replacement values are identified with `{ }` markup.
+
+    # 0. Optionally cleanup the test directory in case a previous example was 
run without cleaning up.
+    curl -i -k -u guest:guest-password -X DELETE \
+        
'https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/example?op=DELETE&recursive=true'
+
+    # 1. Create the inode for workflow definition file in /user/guest/example
+    curl -i -k -u guest:guest-password -X PUT \
+        
'https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/example/workflow.xml?op=CREATE'
+
+    # 2. Upload the workflow definition file.  This file can be found in 
{GATEWAY_HOME}/templates
+    curl -i -k -u guest:guest-password -T workflow-definition.xml -X PUT \
+        '{Value Location header from command above}'
+
+    # 3. Create the inode for hadoop-examples.jar in /user/guest/example/lib
+    curl -i -k -u guest:guest-password -X PUT \
+        
'https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/example/lib/hadoop-examples.jar?op=CREATE'
+
+    # 4. Upload hadoop-examples.jar to /user/guest/example/lib.  Use a 
hadoop-examples.jar from a Hadoop install.
+    curl -i -k -u guest:guest-password -T samples/hadoop-examples.jar -X PUT \
+        '{Value Location header from command above}'
+
+    # 5. Create the inode for a sample input file readme.txt in 
/user/guest/example/input.
+    curl -i -k -u guest:guest-password -X PUT \
+        
'https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/example/input/README?op=CREATE'
+
+    # 6. Upload readme.txt to /user/guest/example/input.  Use the readme.txt 
in {GATEWAY_HOME}.
+    # The sample below uses this README file found in {GATEWAY_HOME}.
+    curl -i -k -u guest:guest-password -T README -X PUT \
+        '{Value of Location header from command above}'
+
+    # 7. Submit the job via Oozie
+    # Take note of the Job ID in the JSON response as this will be used in the 
next step.
+    curl -i -k -u guest:guest-password -H Content-Type:application/xml -T 
workflow-configuration.xml \
+        -X POST 
'https://localhost:8443/gateway/sandbox/oozie/v1/jobs?action=start'
+
+    # 8. Query the job status via Oozie.
+    curl -i -k -u guest:guest-password -X GET \
+        'https://localhost:8443/gateway/sandbox/oozie/v1/job/{Job ID from JSON 
body}'
+
+    # 9. List the contents of the output directory /user/guest/example/output
+    curl -i -k -u guest:guest-password -X GET \
+        
'https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/example/output?op=LISTSTATUS'
+
+    # 10. Optionally cleanup the test directory
+    curl -i -k -u guest:guest-password -X DELETE \
+        
'https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/example?op=DELETE&recursive=true'
+
+### Oozie Client DSL ###
+
+#### submit() - Submit a workflow job.
+
+* Request
+    * text (String) - XML formatted workflow configuration string.
+    * file (String) - A filename containing XML formatted workflow 
configuration.
+    * action (String) - The initial action to take on the job.  Optional: 
Default is "start".
+* Response
+    * BasicResponse
+* Example
+    * `Workflow.submit(session).file(localFile).action("start").now()`
+
+#### status() - Query the status of a workflow job.
+
+* Request
+    * jobId (String) - The job ID to check. This is the ID received when the 
job was created.
+* Response
+    * BasicResponse
+* Example
+    * `Workflow.status(session).jobId(jobId).now().string`
+
+### Oozie HA ###
+
+Please look at #[Default Service HA support]

svn commit: r1863668 [8/9] - in /knox: site/ site/books/knox-0-12-0/ site/books/knox-0-13-0/ site/books/knox-0-14-0/ site/books/knox-1-0-0/ site/books/knox-1-1-0/ site/books/knox-1-2-0/ site/books/knox-1-3-0/ trunk/ trunk/books/1.4.0/ trunk/books/1.4.0...

Reply via email to