[ https://issues.apache.org/jira/browse/HADOOP-9671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13693563#comment-13693563 ]
Sanjay Radia commented on HADOOP-9671: -------------------------------------- Here is an initial draft of hadoop security usage scenarios, threat model and problems that we would like to address. *Hadoop Deployment Usage Scenarios* The use cases below have two variations: with and without perimeter security (such as Knox). * U1 Hadoop insecure deployment (ie using UGI based “authentication”) * U2 Hadoop deployment in Active Directory (Kerberos,LDAP) authentication * U3 Hadoop deployment with Kerberos authentication * U4 Hadoop deployment in LDAP only shop * U5 Hadoop deployment in public Cloud (e.g. AWS, Azure, Rackspace) * U6 Multiple Hadoop clusters in a single organization each with different authentication requirements and potentially different IdPs for each. *Security Threat Model for Hadoop* (This list is an extension of the list published in http://hortonworks.com/wp-content/uploads/2011/10/security-design_withCover-1.pdf # An unauthorized client may access an HDFS file via the RPC or via HTTP protocols. # A unauthorized client may read/write a data block of a file at a DataNode via the pipeline streaming data-transfer protocol # A unauthorized user may submit a job to a queue or delete or change priority of the job. # A unauthorized client may access intermediate data of Map job via its task trackers HTTP shuffle protocol. # An executing task may use the host operating system interfaces to access other tasks, access local data which include intermediate Map output or the local storage of the DataNode that runs on the same physical node. # A task may masquerade as a Hadoop service component such as a DataNode, NameNode, job tracker, task tracker etc. # A user may submit a workflow to Oozie as another user. # A service may attempt to impersonate a user by using the client-presented service access token # A service may attempt to impersonate another service by using the service-presented service access token (when a service is acting as a client of another) # A user may attempt to register as a service through service registration endpoints (is this the same as 6? *Hadoop Security Problems* # Perimeter security solution - Knox addresses this # Remove the need to create Unix accounts on each compute node - (note Unix accounts are merely for isolation and not for authentication.) Linux containers have the potential to fix this. # Remove the need for root startup for Datanodes (HDFS-2856) # Server authentication setup is painful - i.e. installing Keytabs for each server. Simpler solution for Server-server mutual authentication (e.g. NN-DN) and client-server mutual authentication. # Authentication for customers with only LDAP (Both SSO jiras. HADOOP-9392 and HADOOP-9533, are addressing these ) # Hadoop authentication should include group membership so that group membership checking is not needed later. Note this critical for Cloud deplyment where Security for public cloud deployment it is not practical to call back from Cloud to the customer’s environment to get group membership. (Both SSO jiras. HADOOP-9392 and HADOOP-9533, are addressing these ). Related to problem 12. # Remove the shared secret between NN and DN (potentially extensions to the SSO jiras) # Remove the need for NN and JT delegation tokens (potentially extensions to the SSO jiras) # Encryption on communication pipes - verify configurations and test # Encryption on data. One solution is to use OS level encryption- someone needs to verify and test this. # Add ACLs to HDFS # Change Hadoop tokens to include group membership - see the Azure use case U4 above. Hadoop token need to support arbitrary attributes for ABAC. # Implementation improvements and bugs ** Change Hadoop security impl so that UGI (ie non-secure hadoop deployment) uses delegation tokens and block access tokens. (HADOOP-8779) ** Change the implementation of Hadoop rpc security to make the authentication pluggable - note that architecturally Hadoop rpc authentication is pluggable but the code has UGI and Kerberos too burnt in. # Provide the ability to identify poorly or maliciously behaving applications - independently from applications from the same user that may be behaving properly. Note this is not a security issue per-say but we lack a applicaiton/job identity that could be used to throttle a misbehaving application. The hadoop job/hdfs delegation token could be used for that purpose - is this reasonable use for it? > Improve Hadoop security - master jira > ------------------------------------- > > Key: HADOOP-9671 > URL: https://issues.apache.org/jira/browse/HADOOP-9671 > Project: Hadoop Common > Issue Type: Improvement > Reporter: Sanjay Radia > Assignee: Sanjay Radia > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira