Github user vijikarthi commented on the issue:

    https://github.com/apache/flink/pull/2275
  
    Adding some more cotext to the implementation details. which is based on 
the design proposal 
(https://docs.google.com/document/d/1-GQB6uVOyoaXGwtqwqLV8BHDxWiMO2WnVzBoJ8oPaAs/edit?usp=sharing)
    
    Current security implementation works in a subtle way utilizing the Keberos 
cache of the user who starts Flink process/jobs and only in the context of 
supporting secure access to Hadoop cluster. The underlying UGI implementation 
of Hadoop infrastructure is used to harden the security using the keytab cache. 
For Yarn mode of deployment, delegation tokens are created and populated to 
container environment (App Master/JM and TM). 
    
    There are two areas of improvement that current implementation lacks:
    1) Tokens will be expired in due course and hence it impacts long running 
jobs
    2) Missing functionality to support secure connection to Kafka and ZK 
(Kafka 0.9 and latest ZK versions are supporting kerberos based authentication 
using SASL/JAAS)
    
    This PR addresses above gaps by providing Keytab support to securely 
communicate to Hadoop and Kafka/ZK services.
    
    1) Additional Configurations: 
    
    Below new security specific configurations are added to the Flink 
configuration file.
    a) security.principal - user principal that Flink process/connectors should 
authenticate as 
    b) security.keytab - keytab file location
    
    In standlone mode, it is assumed that the configurations pre-exists (manual 
process) on all cluster nodes from where the JM and TMs will be running. 
    
    In Yarn mode, the configuration (and keytab file) is expected only on the 
node from where YarnCLI or FlinkCLI will be invoked. Application code takes 
care of copying Keytab file to JM/TM Yarn containers as local resource for 
lookup.
    
    In the absence of providing security configurations, the delegation token 
mechanism still works to support backward compatibility (manual kinit before 
starting JM/TMs).
    
    2) Process-wide in-memory JAAS configuration to enable Kafka/ZK secure 
authentication.
     
    The JAAS configuration plays a critical role in authentication for 
Kerberized application. Kafka/ZK login module code is expected to construct a 
login context based on supplied JAAS configuration file entries and 
authenticates to produce a subject.  The context is constructed with an 
application name which acts as a lookup key into the configuration, yielding 
one or more login modules.   The login module implements the specific strategy, 
such as using a configured keytab or using the user’s ticket cache.
    
    Instead of managing per-connector JAAS configuration file, a process-wide 
JAAS configuration object is initialized during Flink bootstrap phase, thus 
providing a singular login module to all callers configured to login using the 
supplied keytab.
    
(https://docs.oracle.com/javase/7/docs/api/javax/security/auth/login/Configuration.html#setConfiguration(javax.security.auth.login.Configuration)
    
    To summarize, following sequence happens when the secure configuration is 
enabled.
    Flink bootstrap code (both Yarn and Standalone) initializes security 
context by
    a) Initializing UGI with the supplied keytab and principal which takes care 
of handling Kerberos authentication and login renewal for Hadoop services. 
    b) Creating process-wide JAAS configuration object for Kafka/ZK login 
modules to support Kerberos/SASL authentication. Login renewals are 
automatically taken care by ZK and Kafka login module implementation.
    
    Some additional details are provided in the documentation page as well that 
can be referenced from here.
    
(https://github.com/vijikarthi/flink/blob/FLINK-3929/docs/internals/flink_security.md)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to