[ 
https://issues.apache.org/jira/browse/HADOOP-9671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13693563#comment-13693563
 ] 

Sanjay Radia commented on HADOOP-9671:
--------------------------------------

Here is an initial draft of hadoop security usage scenarios, threat model and 
problems that we would like to address.


*Hadoop Deployment Usage Scenarios*

The use cases below have two variations: with and without perimeter security 
(such as Knox).

* U1 Hadoop insecure deployment (ie using UGI based “authentication”)
* U2 Hadoop deployment in Active Directory (Kerberos,LDAP) authentication
* U3 Hadoop deployment with Kerberos authentication
* U4 Hadoop deployment in LDAP only shop
* U5 Hadoop deployment in public Cloud (e.g. AWS, Azure, Rackspace)
* U6 Multiple Hadoop clusters in a single organization each with different 
authentication requirements and potentially different IdPs for each.


*Security Threat Model for Hadoop*
(This list is an extension of the list published in 
http://hortonworks.com/wp-content/uploads/2011/10/security-design_withCover-1.pdf
# An unauthorized client may access an HDFS file via the RPC or via HTTP 
protocols.
# A unauthorized client may read/write a data block of a file at a DataNode via 
the pipeline streaming data-transfer protocol
# A unauthorized user may submit a job to a queue or delete or change priority 
of the job.
# A unauthorized client may access intermediate data of Map job via its task 
trackers HTTP shuffle protocol.
# An executing task may use the host operating system interfaces to access 
other tasks, access local data which include intermediate Map output or the 
local storage of the DataNode that runs on the same physical node.
# A task may masquerade as a Hadoop service component such as a DataNode, 
NameNode, job tracker, task tracker etc.
# A user may submit a workflow to Oozie as another user.
# A service may attempt to impersonate a user by using the client-presented 
service access token
# A service may attempt to impersonate another service by using the 
service-presented service access token (when a service is acting as a client of 
another)
# A user may attempt to register as a service through service registration 
endpoints (is this the same as 6?

*Hadoop Security Problems*
# Perimeter security solution - Knox addresses this
# Remove the need to create Unix accounts on each compute node - (note Unix 
accounts are merely for isolation and not for authentication.) Linux containers 
have the potential to fix this.
# Remove the need for root startup for Datanodes (HDFS-2856)
# Server authentication setup is painful - i.e. installing Keytabs for each 
server. Simpler solution for Server-server mutual authentication (e.g. NN-DN) 
and client-server mutual authentication.
# Authentication for customers with only LDAP (Both SSO jiras. HADOOP-9392 and 
HADOOP-9533, are addressing these )
# Hadoop authentication should include group membership so that group 
membership checking is not needed later. Note this critical for Cloud deplyment 
where Security for public cloud deployment it is not practical to call back 
from Cloud to the customer’s environment to get group membership. (Both SSO 
jiras. HADOOP-9392 and HADOOP-9533, are addressing these ). Related to problem 
12.
# Remove the shared secret between NN and DN (potentially extensions to the SSO 
jiras)
# Remove the need for NN and JT delegation tokens (potentially extensions to 
the SSO jiras)
# Encryption on communication pipes - verify configurations and test
# Encryption on data. One solution is to use OS level encryption- someone needs 
to verify and test this.
# Add ACLs to HDFS
# Change Hadoop tokens to include group membership - see the Azure use case U4 
above.  Hadoop token need to support arbitrary attributes for ABAC.
# Implementation improvements and bugs
** Change Hadoop security impl so that UGI (ie non-secure hadoop deployment) 
uses delegation tokens and block access tokens. (HADOOP-8779)
** Change the implementation of Hadoop rpc security to make the authentication 
pluggable - note that architecturally Hadoop rpc authentication is pluggable 
but the code has UGI and Kerberos too burnt in.
# Provide the ability to identify poorly or maliciously behaving applications - 
independently from applications from the same user that may be behaving 
properly. Note this is not a security issue per-say but we lack a 
applicaiton/job identity that could be used to throttle a misbehaving 
application. The hadoop job/hdfs delegation token could be used for that 
purpose - is this reasonable use for it? 


  
                
> Improve Hadoop security - master jira
> -------------------------------------
>
>                 Key: HADOOP-9671
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9671
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to