[ 
https://issues.apache.org/jira/browse/HDFS-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837959#comment-13837959
 ] 

Adam Faris commented on HDFS-5569:
----------------------------------

I'll attempt to answer the outstanding questions.  Please let me know if I 
missed anything.   
----
Haohui Mai: What is authorization?  Authorization is a function that specifies 
access rights to resources.   http://en.wikipedia.org/wiki/Authorization

For example, I have a US passport that has my name and photo and is not 
expired.  I walk into a bank and hand my passport to the bank teller who looks 
at the photo, recognizes my face, verifies the date and watermarks.  Everything 
checks out and I am now authenticated to the bank.  I now ask the teller to 
withdrawal $100 million from Doug Cuttings bank account.  The bank teller 
checks to see if I have access and says I am not authorized to make the 
withdrawal.  Kerberos only provides authentication, not authorization.  In this 
example my passport is the TGT and the bank teller is WebHDFS.  WebHDFS needs 
to have better authorization built into it.  

How about using a transparent proxy?  Using nginx or traffic server is an 
interesting idea but it's not a good solution.  One needs to deploy the proxy 
sw and configs to all nodes.  Then how would the url mappings work?  One asks 
the namenode for file locations and the 307 response would point to the wrong 
port on the datanode.  What about troubleshooting GSSAPI errors?  Is it the 
client or the proxy?   Having personally supported the CDN at Yahoo! I know 
first hand the issues of trouble shooting web applications that use proxies.   
----
Alejandro Abdelnur:  Reverse DNS lookup penalties? Assuming we are filtering by 
hostname and not IP networks, reverse dns lookups being a blocker for this 
request is where we will have to agree to disagree.  While theoretically it's 
true, in practice one more DNS query is not going to make a difference to a 
individual datanode.  Even if attempting to DDOS the cluster with client 
connections, there will be other problems before reverse lookup resolution 
becomes the blocker.  

Why not use HttpFS/Hoop?  I'm unable to find references to HttpFS/Hoop in the 
1.2.1 (stable) source tree, so it appears to be a 2.x feature?  If HttpFS/Hoop 
is compatible with hadoop 1.2.x, it's going to have the above mentioned proxy 
issues.  Troubleshooting client requests are going to be more complicated, & 
configuring and deployment is going to be more complicated as we now have to 
securely manage tomcat.  Using a proxy comes with a lot of overhead and is not 
a good solution for this request.
----

Alejandro's comment on using tomcat to support my request is almost spot on.  
But instead of tomcat supporting the access control feature, it should be jetty 
as jetty offers the ability to block by source IP and is already included with 
Hadoop.  This is why I opened this JIRA, WebHDFS needs to be updated to offer 
the ability of blocking or granting access by IP.

Thanks. 

> WebHDFS should support a deny/allow list for data access
> --------------------------------------------------------
>
>                 Key: HDFS-5569
>                 URL: https://issues.apache.org/jira/browse/HDFS-5569
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: webhdfs
>            Reporter: Adam Faris
>              Labels: features
>
> Currently we can't restrict what networks are allowed to transfer data using 
> WebHDFS.  Obviously we can use firewalls to block ports, but this can be 
> complicated and problematic to maintain.  Additionally, because all the jetty 
> servlets run inside the same container, blocking access to jetty to prevent 
> WebHDFS transfers also blocks the other servlets running inside that same 
> jetty container.
> I am requesting a deny/allow feature be added to WebHDFS.  This is already 
> done with the Apache HTTPD server, and is what I'd like to see the deny/allow 
> list modeled after.   Thanks.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to