Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2275#discussion_r72429497
  
    --- Diff: docs/internals/flink_security.md ---
    @@ -0,0 +1,87 @@
    +---
    +title:  "Flink Security"
    +# Top navigation
    +top-nav-group: internals
    +top-nav-pos: 10
    +top-nav-title: Flink Security
    +---
    +<!--
    +Licensed to the Apache Software Foundation (ASF) under one
    +or more contributor license agreements.  See the NOTICE file
    +distributed with this work for additional information
    +regarding copyright ownership.  The ASF licenses this file
    +to you under the Apache License, Version 2.0 (the
    +"License"); you may not use this file except in compliance
    +with the License.  You may obtain a copy of the License at
    +
    +  http://www.apache.org/licenses/LICENSE-2.0
    +
    +Unless required by applicable law or agreed to in writing,
    +software distributed under the License is distributed on an
    +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +KIND, either express or implied.  See the License for the
    +specific language governing permissions and limitations
    +under the License.
    +-->
    +
    +This document briefly describes how Flink security works in the context of 
various deployment mechanism (Standalone/Cluster vs YARN) 
    +and the connectors that participates in Flink Job execution stage. This 
documentation can be helpful for both administrators and developers 
    +who plans to run Flink on a secure environment.
    +
    +## Objective
    +
    +The primary goal of Flink security model is to enable secure data access 
for jobs within a cluster via connectors. In production deployment scenario, 
    +streaming jobs are understood to run for longer period of time 
(days/weeks/months) and the system must be  able to authenticate against secure 
    +data sources throughout the life of the job. The current implementation 
supports running Flink cluster (Job Manager/Task Manager/Jobs) under the 
    +context of a Kerberos identity based on Keytab credential supplied during 
deployment time. Any jobs submitted will continue to run in the identity of the 
cluster.
    +
    +## How Flink Security works
    +Flink deployment includes running Job Manager/ZooKeeper, Task Manager(s), 
Web UI and Job(s). Jobs (user code) can be submitted through web UI and/or CLI. 
    +A Job program may use one or more connectors (Kafka, HDFS, Cassandra, 
Flume, Kinesis etc.,) and each connector may have a specific security 
    +requirements (Kerberos, database based, SSL/TLS, custom etc.,). While 
satisfying the security requirements for all the connectors evolve over a 
period 
    +of time but at this time of writing, the following connectors/services are 
tested for Kerberos/Keytab based security.
    +
    +- Kafka (0.9)
    +- HDFS
    +- ZooKeeper
    +
    +Hadoop uses UserGroupInformation (UGI) class to manage security. UGI is a 
static implementation that takes care of handling Kerberos authentication. 
Flink bootstrap implementation
    +(JM/TM/CLI) takes care of instantiating UGI with appropriate security 
credentials to establish necessary security context.
    +
    +Services like Kafka and ZooKeeper uses SASL/JAAS based authentication 
mechanism to authenticate against a Kerberos server. It expects JAAS 
configuration with platform-specific login 
    +module *name* to be provided. Managing per-connector configuration files 
will be an overhead and to overcome this requirement, a process-wide JAAS 
configuration object is 
    +instantiated which serves standard ApplicationConfigurationEntry for the 
connectors that authenticates using SASL/JAAS mechanism.
    +
    +It is important to understand that the Flink processes (JM/TM/UI/Jobs) 
itself uses UGI's doAS() implementation to run under specific user context 
i.e., if Hadoop security is enabled 
    +then the Flink processes will be running under secure user account or else 
it will run as the OS login user account who starts Flink cluster.
    +
    +## Security Configurations
    +
    +Secure credentials can be supplied by adding below configuration elements 
to Flink configuration file:
    +
    +- `security.keytab`: Absolute path to Kerberos keytab file that contains 
the user credentials/secret.
    +
    +- `security.principal`: User principal name that the Flink cluster should 
run as.
    +
    +Delegation token mechanism (*kinit cache*) is still supported for backward 
compatibility but enabling security using *keytab* configuration is the 
preferred and recommended approach.
    --- End diff --
    
    *The* Delegation token mechanism


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to