Re: Hadoop and security.

Steve Loughran Mon, 06 Oct 2008 03:02:49 -0700

Dmitry Pushkarev wrote:

Dear hadoop users,
I'm lucky to work in academic environment where information security is not
the question. However, I'm sure that most of the hadoop users aren't.
Here is the question: how secure hadoop is?  (or let's say foolproof)

Right now hadoop is about as secure as NFS. when deployed onto privatedatacentres with good physical security and well set up networks, youcan control who gets at the data. Without that, you are sharing yourstate with anyone who can issue HTTP and hadoop IPC requests.

Here is the answer: http://www.google.com/search?client=opera
<http://www.google.com/search?client=opera&rls=en&q=Hadoop+Map/Reduce+Admini
stration&sourceid=opera&ie=utf-8&oe=utf-8>
&rls=en&q=Hadoop+Map/Reduce+Administration&sourceid=opera&ie=utf-8&oe=utf-8
not quite.

see also http://www.google.com/search?q=axis+happiness+page ; pagesthat we add for benefit of the ops team end up sneaking out into the bignet.

What we're seeing here is open hadoop cluster, where anyone who capable of
installing hadoop and changing his username to webcrawl can use their
cluster and read their data, even though firewall is perfectly installed and
ports like ssh are filtered to outsiders. After you've played enough with
data, you can observe that you can submit jobs as well, and these jobs can
execute shell commands. Which is very, very sad.

In my view, this significantly limits distributed hadoop applications, where
part of your cluster may reside on EC2 or other distant datacenter, since
you always need to have certain ports open to an array of ip addresses (if
your instances are dynamic) which isn't acceptable if anyone from that ip
range can connect to your cluster.

well, maybe that's a fault of EC2s architecture in which a deploymentrequest doesn't include a declaration of the network configuration?

Can we propose to developers to introduce some basic user-management and
access controls to help hadoop make one step further towards
production-quality system?

Being an open source project, you can do more than propose, you can helpbuild some basic user-management and access controls. As to "productionquality"; it is ready for production, albeit in locked down datacentres.Which is the primary deployment infrastructure of many of the activedevelopers. As in most community-contributed open source projects, ifyou have specific needs beyond what the active developers need, you endup implementing them your self.

The big issue with security is that it is all or nothing. Right now itis blatantly insecure, so you should not be surprised that anyone hasaccess to your files. To actually lock it down, you would need toauthenticate and possibly encrypt all communications; this adds a lot ofoverhead, which is why it will be avoided in the big datacentres. Youalso need to go to a lot of effort to make sure it is secure across theboard, with no JSP pages providing accidental privilege escalation, noapi calls letting you see stuff you shouldn't. Its not like a normalfeature defect where you can say "don't do that"; it's not so easy tovalidate using functional tests that test the expected uses of the code.This is why securing an application is such a hard thing to do.




--
Steve Loughran                  http://www.1060.org/blogxter/publish/5
Author: Ant in Action           http://antbook.org/

Re: Hadoop and security.

Reply via email to