Dmitry Pushkarev wrote:
Dear hadoop users,
I'm lucky to work in academic environment where information security is not
the question. However, I'm sure that most of the hadoop users aren't.
Here is the question: how secure hadoop is?  (or let's say foolproof)

Right now hadoop is about as secure as NFS. when deployed onto private datacentres with good physical security and well set up networks, you can control who gets at the data. Without that, you are sharing your state with anyone who can issue HTTP and hadoop IPC requests.


Here is the answer: http://www.google.com/search?client=opera
<http://www.google.com/search?client=opera&rls=en&q=Hadoop+Map/Reduce+Admini
stration&sourceid=opera&ie=utf-8&oe=utf-8>
&rls=en&q=Hadoop+Map/Reduce+Administration&sourceid=opera&ie=utf-8&oe=utf-8
not quite.


see also http://www.google.com/search?q=axis+happiness+page ; pages that we add for benefit of the ops team end up sneaking out into the big net.

What we're seeing here is open hadoop cluster, where anyone who capable of
installing hadoop and changing his username to webcrawl can use their
cluster and read their data, even though firewall is perfectly installed and
ports like ssh are filtered to outsiders. After you've played enough with
data, you can observe that you can submit jobs as well, and these jobs can
execute shell commands. Which is very, very sad.

In my view, this significantly limits distributed hadoop applications, where
part of your cluster may reside on EC2 or other distant datacenter, since
you always need to have certain ports open to an array of ip addresses (if
your instances are dynamic) which isn't acceptable if anyone from that ip
range can connect to your cluster.

well, maybe that's a fault of EC2s architecture in which a deployment request doesn't include a declaration of the network configuration?


Can we propose to developers to introduce some basic user-management and
access controls to help hadoop make one step further towards
production-quality system?



Being an open source project, you can do more than propose, you can help build some basic user-management and access controls. As to "production quality"; it is ready for production, albeit in locked down datacentres. Which is the primary deployment infrastructure of many of the active developers. As in most community-contributed open source projects, if you have specific needs beyond what the active developers need, you end up implementing them your self.

The big issue with security is that it is all or nothing. Right now it is blatantly insecure, so you should not be surprised that anyone has access to your files. To actually lock it down, you would need to authenticate and possibly encrypt all communications; this adds a lot of overhead, which is why it will be avoided in the big datacentres. You also need to go to a lot of effort to make sure it is secure across the board, with no JSP pages providing accidental privilege escalation, no api calls letting you see stuff you shouldn't. Its not like a normal feature defect where you can say "don't do that"; it's not so easy to validate using functional tests that test the expected uses of the code. This is why securing an application is such a hard thing to do.



--
Steve Loughran                  http://www.1060.org/blogxter/publish/5
Author: Ant in Action           http://antbook.org/

Reply via email to