Github user arunmahadevan commented on a diff in the pull request:
https://github.com/apache/storm/pull/2099#discussion_r116670032
--- Diff: external/storm-hive/README.md ---
@@ -101,6 +101,80 @@ Hive Trident state also follows similar pattern to
HiveBolt it takes in HiveOpti
TridentState state = stream.partitionPersist(factory, hiveFields, new
HiveUpdater(), new Fields());
```
+
+
+##Working with Secure Hive
+If your topology is going to interact with secure Hive, your bolts/states
needs to be authenticated by Hive Server. We
+currently have 2 options to support this:
+
+### Using keytabs on all worker hosts
+If you have distributed the keytab files for hive user on all potential
worker hosts then you can use this method. You should specify a
+hive configs using the methods HiveOptions.withKerberosKeytab(),
HiveOptions.withKerberosPrincipal() methods.
+
+On worker hosts the bolt/trident-state code will use the keytab file with
principal provided in the config to authenticate with
+Hive. This method is little dangerous as you need to ensure all workers
have the keytab file at the same location and you need
+to remember this as you bring up new hosts in the cluster.
+
+
+### Using Hive MetaStore delegation tokens
+Your administrator can configure nimbus to automatically get delegation
tokens on behalf of the topology submitter user.
+Since Hive depends on HDFS, we should also configure HDFS delegation
tokens.The nimbus should be started with following configurations:
+
+More details about Hadoop Tokens here:
https://github.com/apache/storm/blob/master/docs/storm-hive.md
+
+```
+nimbus.autocredential.plugins.classes :
["org.apache.storm.hive.security.AutoHive",
"org.apache.storm.hdfs.security.AutoHDFS"]
+nimbus.credential.renewers.classes :
["org.apache.storm.hive.security.AutoHive",
"org.apache.storm.hdfs.security.AutoHDFS"]
+nimbus.credential.renewers.freq.secs : 82800 (23 hours)
+
+hive.keytab.file: "/path/to/keytab/on/nimbus" (This is the keytab of hive
super user that can impersonate other users.)
+hive.kerberos.principal: "[email protected]"
+hive.metastore.uris: "thrift://server:9083"
+
+//hdfs configs
+hdfs.keytab.file: "/path/to/keytab/on/nimbus" (This is the keytab of hdfs
super user that can impersonate other users.)
+hdfs.kerberos.principal: "[email protected]"
+```
+
+Your topology configuration should have:
+
+```
+topology.auto-credentials :["org.apache.storm.hive.security.AutoHive",
"org.apache.storm.hdfs.security.AutoHDFS"]
+```
+
+If nimbus did not have the above configuration you need to add and then
restart it. Ensure the hadoop configuration
+files (core-site.xml, hdfs-site.xml and hive-site.xml) and the storm-hive
connector jar with all the dependencies is present in nimbus's classpath.
+
+As an alternative to adding the configuration files (core-site.xml,
hdfs-site.xml and hive-site.xml) to the classpath, you could specify the
configurations
+as a part of the topology configuration. E.g. in you custom storm.yaml (or
-c option while submitting the topology),
+
+```
+hiveCredentialsConfigKeys : ["cluster1", "cluster2"] (the hive clusters
you want to fetch the tokens from)
+cluster1: [{"config1": "value1", "config2": "value2", ... }] (A map of
config key-values specific to cluster1)
+cluster2: [{"config1": "value1", "hive.keytab.file":
"/path/to/keytab/for/cluster2/on/nimubs", "hive.kerberos.principal":
"[email protected]", "hive.metastore.uris": "thrift://server:9083"}]
(here along with other configs, we have custom keytab and principal for
"cluster2" which will override the keytab/principal specified at topology level)
+
+hdfsCredentialsConfigKeys : ["cluster1", "cluster2"] (the hdfs clusters
you want to fetch the tokens from)
+cluster1: [{"config1": "value1", "config2": "value2", ... }] (A map of
config key-values specific to cluster1)
+cluster2: [{"config1": "value1", "hdfs.keytab.file":
"/path/to/keytab/for/cluster2/on/nimubs", "hdfs.kerberos.principal":
"[email protected]"}] (here along with other configs, we have custom
keytab and principal for "cluster2" which will override the keytab/principal
specified at topology level)
--- End diff --
cluster value should be a map. Take a look at hdfs, hbase docs which was
fixed recently.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---