Hadoopy is secure enough to be used on a cluster that has access control in
a friendly environment.  That is to say, not very.  These issues are well
known.

User identities were added recently, but, as you note, they are dependent on
trusting unix logins and can easily be spoofed.  More secure identity
management is coming before too long.

In the EC2 environment, it is presumed that you would limit connectivity to
the hadoop cluster to just members of the cluster and machines that are
allowed to submit jobs using network security.  This is essentially no
different from the way that things work with any application in EC2 such as
mySQL.  You just don't allow external access.

Even with the new identity management, however, it isn't likely that you
would want to expose your hadoop cluster any more than you would want to
expose your database or nfs server.  If outsiders need access, then you
should have a web tier that mediates that access.

On Sat, Oct 4, 2008 at 11:54 PM, Dmitry Pushkarev <[EMAIL PROTECTED]> wrote:

> Dear hadoop users,
>
>
>
> I'm lucky to work in academic environment where information security is not
> the question. However, I'm sure that most of the hadoop users aren't.
>
>
>
> Here is the question: how secure hadoop is?  (or let's say foolproof)
>
>
>
> Here is the answer: http://www.google.com/search?client=opera
> <
> http://www.google.com/search?client=opera&rls=en&q=Hadoop+Map/Reduce+Admini
> stration&sourceid=opera&ie=utf-8&oe=utf-8<http://www.google.com/search?client=opera&rls=en&q=Hadoop+Map/Reduce+Administration&sourceid=opera&ie=utf-8&oe=utf-8>
> >
> &rls=en&q=Hadoop+Map/Reduce+Administration&sourceid=opera&ie=utf-8&oe=utf-8
> not quite.
>
>
>
> What we're seeing here is open hadoop cluster, where anyone who capable of
> installing hadoop and changing his username to webcrawl can use their
> cluster and read their data, even though firewall is perfectly installed
> and
> ports like ssh are filtered to outsiders. After you've played enough with
> data, you can observe that you can submit jobs as well, and these jobs can
> execute shell commands. Which is very, very sad.
>
>
>
> In my view, this significantly limits distributed hadoop applications,
> where
> part of your cluster may reside on EC2 or other distant datacenter, since
> you always need to have certain ports open to an array of ip addresses (if
> your instances are dynamic) which isn't acceptable if anyone from that ip
> range can connect to your cluster.
>
>
>
> Can we propose to developers to introduce some basic user-management and
> access controls to help hadoop make one step further towards
> production-quality system?
>
>
>
> And, by the way add robots.txt to default distribution.  (but I doubt it
> will help, as it takes less than a week to scan all internet for given port
> on home DSL..)
>
>
>
> ---
>
> Dmitry
>
>
>
>


-- 
ted

Reply via email to