[ 
https://issues.apache.org/jira/browse/PIG-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179484#comment-15179484
 ] 

Rohini Palaniswamy commented on PIG-4796:
-----------------------------------------

bq. It is my understanding the the client side Hadoop code (in this case in the 
pig client) remembers the kerberos keytab and ships that information into the 
cluster. 
  Keytab information is not shipped to the cluster. That will be a security 
issue. Kerberos authentication is done with hadoop services (NN, RM, Job 
History Server, Application Timeline Server, HCatalog, HBase, etc) and they 
hand out delegation tokens which is then shipped as Credentials via private 
distributed cache to talk to those services.

bq. As far as I know it is there that the tasks have an exception handling 
mechanism to re-login (using the provided keytab data) when a kerberos failure 
occurs.
  SaslRpcClient relogins with keytab for any connection failure to handle 
expiration of TGTs. I forgot about that and was thinking you might have to 
relogin again in code between job submissions. We had to do re-logins ourselves 
in the code for some HTTP services we authenticated with SPNEGO. But with Pig 
it is going to communicate mostly with SaslRpcClient and that does it for you. 
So your patch is good.

Will commit the patch tomorrow morning.

> Authenticate with Kerberos using a keytab file
> ----------------------------------------------
>
>                 Key: PIG-4796
>                 URL: https://issues.apache.org/jira/browse/PIG-4796
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.15.0
>            Reporter: Niels Basjes
>            Assignee: Niels Basjes
>              Labels: feature, kerberos, security
>         Attachments: 2016-02-18-1510-PIG-4796.patch, 
> 2016-02-18-PIG-4796-rough-proof-of-concept.patch, PIG-4796-2016-02-23.patch, 
> PIG-4796-4.patch
>
>
> When running in a Kerberos secured environment users are faced with the 
> limitation that their jobs cannot run longer than the (remaining) ticket 
> lifetime of their Kerberos tickets. The environment I work in these tickets 
> expire after 10 hours, thus limiting the maximum job duration to at most 10 
> hours (which is a problem).
> In the Hadoop tooling there is a feature where you can authenticate using a 
> Kerberos keytab file (essentially a file that contains the encrypted form of 
> the kerberos principal and password). Using this the running application can 
> request new tickets from the Kerberos server when the initial tickets expire.
> In my Java/Hadoop applications I commonly include these two lines:
> {code}
> System.setProperty("java.security.krb5.conf", "/etc/krb5.conf");
> UserGroupInformation.loginUserFromKeytab("nbas...@xxxxxx.net", 
> "/home/nbasjes/.krb/nbasjes.keytab");
> {code}
> This way I have run an Apache Flink based application for more than 170 hours 
> (about a week) on the kerberos secured Yarn cluster.
> What I propose is to have a feature that I can set the relevant kerberos 
> values in my pig script and from there be able to run a pig job for many days 
> on the secured cluster.
> Proposal how this can look in a pig script:
> {code}
> SET java.security.krb5.conf '/etc/krb5.conf'
> SET job.security.krb5.principal 'nbas...@xxxxxx.net'
> SET job.security.krb5.keytab '/home/nbasjes/.krb/nbasjes.keytab'
> {code}
> So iff all of these are set (or at least the last two) then the 
> aforementioned  UserGroupInformation.loginUserFromKeytab method is called 
> before submitting the job to the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to