>> I believe encryption is becoming a core part of Hadoop. I think that 
>> moving core components out of Hadoop is bad from a project management 
>> perspective.

> Although it's certainly true that encryption capabilities (in HDFS, YARN, 
> etc.) are becoming core to Hadoop, I don't think that should really influence 
> whether or not the non-Hadoop-specific encryption routines should be part of 
> the Hadoop code base, or part of the code base of another project that Hadoop 
> depends on. If Chimera had existed as a library hosted at ASF when HDFS 
> encryption was first developed, HDFS probably would have just added that as a 
> dependency and been done with it. I don't think we would've copy/pasted the 
> code for Chimera into the Hadoop code base.

Agree with ATM. I want to also make an additional clarification. I agree that 
the encryption capabilities are becoming core to Hadoop. While this effort is 
to put common and shared encryption routines such as crypto stream 
implementations into a scope which can be widely shared across the Apache 
ecosystem. This doesn't move Hadoop encryption out of Hadoop (that is not 
possible). 

Agree if we make it a separate and independent releases project in Hadoop takes 
a step further than the existing approach and solve some issues (such as 
libhadoop.so problem). Frankly speaking, I think it is not the best option we 
can try. I also expect that an independent release project within Hadoop core 
will also complicate the existing release ideology of Hadoop release. 

Thanks,
Haifeng

-----Original Message-----
From: Aaron T. Myers [mailto:a...@cloudera.com] 
Sent: Friday, January 29, 2016 9:51 AM
To: hdfs-dev@hadoop.apache.org
Subject: Re: Hadoop encryption module as Apache Chimera incubator project

On Wed, Jan 27, 2016 at 11:31 AM, Owen O'Malley <omal...@apache.org> wrote:

> I believe encryption is becoming a core part of Hadoop. I think that 
> moving core components out of Hadoop is bad from a project management 
> perspective.
>

Although it's certainly true that encryption capabilities (in HDFS, YARN,
etc.) are becoming core to Hadoop, I don't think that should really influence 
whether or not the non-Hadoop-specific encryption routines should be part of 
the Hadoop code base, or part of the code base of another project that Hadoop 
depends on. If Chimera had existed as a library hosted at ASF when HDFS 
encryption was first developed, HDFS probably would have just added that as a 
dependency and been done with it. I don't think we would've copy/pasted the 
code for Chimera into the Hadoop code base.


> To put it another way, a bug in the encryption routines will likely 
> become a security problem that security@hadoop needs to hear about.
>
I don't think
> adding a separate project in the middle of that communication chain is 
> a good idea. The same applies to data corruption problems, and so on...
>

Isn't the same true of all the libraries that Hadoop currently depends upon? If 
the commons-httpclient library (or commons-codec, or commons-io, or guava, 
or...) has a security vulnerability, we need to know about it so that we can 
update our dependency to a fixed version. This case doesn't seem materially 
different than that.


>
>
> > It may be good to keep at generalized place(As in the discussion, we 
> > thought that place could be Apache Commons).
>
>
> Apache Commons is a collection of *Java* projects, so Chimera as a 
> JNI-based library isn't a natural fit.
>

Could very well be that Apache Commons's charter would preclude Chimera.
You probably know better than I do about that.


> Furthermore, Apache Commons doesn't
> have its own security list so problems will go to the generic 
> secur...@apache.org.
>

That seems easy enough to remedy, if they wanted to, and besides I'm not sure 
why that would influence this discussion. In my experience projects that don't 
have a separate security@project.a.o mailing list tend to just handle security 
issues on their private@project.a.o mailing list, which seems fine to me.


>
> Why do you think that Apache Commons is a better home than Hadoop?
>

I'm certainly not at all wedded to Apache Commons, that just seemed like a 
natural place to put it to me. Could be that a brand new TLP might make more 
sense.

I *do* think that if other non-Hadoop projects want to make use of Chimera, 
which as I understand it is the goal which started this thread, then Chimera 
should exist outside of Hadoop so that:

a) Projects that have nothing to do with Hadoop can just depend directly on 
Chimera, which has nothing Hadoop-specific in there.

b) The Hadoop project doesn't have to export/maintain/concern itself with yet 
another publicly-consumed interface.

c) Chimera can have its own (presumably much faster) release cadence completely 
separate from Hadoop.

--
Aaron T. Myers
Software Engineer, Cloudera

Reply via email to