> How do you define the 'Hadoop complex eco-system'? If that definition
Agreed, complex is a relative term. I used the term complex, because now more 
than 20 products use Hadoop and list is growing. There are 10 products listed 
on http://hadoop.apache.org/. Then there are others projects like Accumulo, 
Impala, Storm, Kafka, Falcon, Pig, Flume, Sqoop, Oozie, etc. which uses HDFS or 
support/enable other products within Hadoop ecosystem. If we dig deeper, each 
component might have multiple processes (Name Node, Data Node, Job Tracker, 
Storm Nimbus Server, HBase Master Servers, HBase Regions Servers, HA, etc). 
With YARN, now user can run their applications in the cluster, which is a great 
feature, but it is very scary from security point of view, because now users 
can write their custom application and run it within a secure data center.

I don’t feel one technology or one company or one small group or one approach 
can solve this problem. This has to be addressed by the community working 
together. This would also require a lot of support from each dependent projects 
and lot of co-ordination. And there would be multiple security solutions 
available for the end users to pick from.

> includes projects such as HBase, we have significant security controls, so
The mature projects have started beefing up their security features. In recent 
releases, HBase added cell based access control and encryption, HDFS added 
advanced ACLs and now working on file level encryptions, Hive added ATZ-NG, no 
encryption yet. The newer ones like Solr, Storm, Falcon have very basic 
security control. On the good news side, most components have started 
supporting Kerberos and SSL. But encryption at rest is still a challenge. In 
most cases it is all or none, except probably HBase and Accumulo. Access 
control and auditing is also not that mature among the newer projects. The goal 
is here is not to reinvent or impose on each project, but to reuse the existing 
security technologies consistently across projects and at the same extend it 
where applicable.

> or the combination of Hive+Sentry would agree with that statement either.
Personally, Hive is my ideal role model for all hadoop projects to follow. Out 
of the box, it has inbuilt access control, but also provides APIs to plug your 
authorization model. Now security projects like Argus can extend it to support 
attribute based access control, cell based access control, tagging, 
multi-tenancy, auditing, etc. Users based on their security requirement or 
appetite might decide to go with the default or choose one of the other 
security providers. Similar requirements might be there for HBase, but 
expecting all Hadoop components to keep up with each other is counter 
productive, while a dedicated security provider (project) might do more 
extensive and uniform job. Users might also pick multiple security providers 
within their cluster to address specific security concerns.

Since we are on the topic of complexity, one of the reason Hadoop is popular is 
because of its openness. Hive might be on top of anything, e.g. on HDFS,  
HBase+HDFS, flat file, etc. While you can access SQL queries via Hive, you can 
also write Pig or MR job to access the underlying HDFS file directly. This is a 
powerful feature, which now gives them ability to run sophisticated analytical 
jobs or use enterprise grade BI tool. But this also allows users to circumvent 
Hive’s native security. For Hive or any native component, cross component 
security is out of scope (and should be). This problem can be solved by 
security providers like Argus, who can enforce adequate security consistently 
across components or project boundaries. 

Happy to discuss more on this topic.

Thanks

Bosco


On Jul 16, 2014, at 7:38 PM, Andrew Purtell <apurt...@apache.org> wrote:

> This statement might not be quite right:
> 
>> Even within Hadoop complex eco-system, each components have limited or no
> security controls.
> 
> How do you define the 'Hadoop complex eco-system'? If that definition
> includes projects such as HBase, we have significant security controls, so
> that wouldn't be a correct statement. Not sure those working on Accumulo,
> or the combination of Hive+Sentry would agree with that statement either.
> 
> It's not necessary to survey the Hadoop ecosystem before incubating of
> course, or even after, but it sounds like that might be a good idea.
> 
> 
> 
> On Wed, Jul 16, 2014 at 5:06 PM, Don Bosco Durai <bdu...@hortonworks.com>
> wrote:
> 
>> Hi JB
>> 
>> We will be centralizing the administration and auditing for Knox. And we
>> will be also standardizing the authentication for web applications for all
>> components within Hadoop ecosystem, for which we might consider Shiro. I
>> would like to understand more about Syncope and see how production ready it
>> is...
>> 
>> The principle is to leverage existing security solutions where applicable.
>> Even within Hadoop complex eco-system, each components have limited or no
>> security controls. Instead of re-inventing everything, we will extend the
>> core component security capabilities and add where needed. So the security
>> is uniform, plug able and scalable.
>> 
>> Providing a layered security along with central administration and
>> auditing capabilities will enhance the security, usability, enterprise
>> integration, compliance, etc. which will lead to more adoption of Apache
>> Hadoop and projects working within its eco system.
>> 
>> Regards
>> 
>> Bosco
>> 
>> `
>> On Jul 16, 2014, at 12:12 AM, Jean-Baptiste Onofré <j...@nanthrax.net>
>> wrote:
>> 
>>> Hi,
>>> 
>>> it looks interesting.
>>> 
>>> Do you have an idea about the interactions with other projects (Knox,
>> Shiro, Syncope, whatever) ?
>>> 
>>> Regards
>>> JB
>>> 
>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>>> For additional commands, e-mail: general-h...@incubator.apache.org
>>> 
>> 
>> 
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>> For additional commands, e-mail: general-h...@incubator.apache.org
>> 
>> 
> 
> 
> -- 
> Best regards,
> 
>   - Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Reply via email to