I think you have misidentified the two camps. There is a camp that believes we should phase out the code in favour of the HDFS encryption, and a camp that believes the code is sufficiently mature. I don't think there is a group that is interested in improving the state of things.
On Thu, Nov 5, 2015 at 12:02 PM, Christopher <ctubb...@apache.org> wrote: > JIRAs are fine, but I thought this thread was mostly addressing the fact > that there doesn't seem to be a sustained interest in actually working on > any of the JIRAs addressing that area of code. Am I wrong? Is there > willingness from anybody to expend effort on this code? Even if not, we can > still make JIRAs, but they'll probably just be ignored. So, the question > for me is: which JIRAs should we make? Are we going to pursue phasing out > the code, or pursue improving it? Those are very different JIRA text. > > On Thu, Nov 5, 2015 at 12:22 PM Mike Drob <md...@apache.org> wrote: > > > Can we file some JIRAs to build out a suite to test this and run the > > necessary tests? > > > > On Thu, Nov 5, 2015 at 11:17 AM, Christopher <ctubb...@apache.org> > wrote: > > > > > My main concern using HDFS encryption vs. built-in Accumulo > > implementation > > > is possibly performance with respect to seeks. If we encrypt our > indexed > > > blocks independently (as we do now), I suspect our seeks would be more > > > performant than relying on HDFS encryption, whose encrypted blocks may > > not > > > fall on our index boundaries. If this is a small difference, it might > > still > > > be worth it for convenience and simpler maintenance, but I suspect the > > > difference will be somewhat substantial. > > > > > > On Thu, Nov 5, 2015 at 12:11 PM Josh Elser <josh.el...@gmail.com> > wrote: > > > > > > > +1 I think this is the right step. My hunch is that some of the > common > > > > data access patterns that we have in Accumulo (over HBase) is that > the > > > > per-colfam encryption isn't quick as common a design pattern as it is > > > > for HBase (please tell me I'm wrong if anyone disagrees -- this is > > > > mostly a gut reaction). I think our users would likely benefit more > > from > > > > a per-namespace/table encryption control like you suggest. > > > > > > > > Implementing RFile encryption at HDFS level (e.g. tie a specific > > > > zone/key for a table) is probably straightforward. Changing the > > > > TServer's WAL use would likely be trickier to get right (a tserver > > would > > > > have multiple WALs, one for each unique zone/key from Tablet it > happens > > > > to host). Maybe worrying about that is getting ahead of things -- > just > > > > thought about it and figured I'd mention it :) > > > > > > > > William Slacum wrote: > > > > > Yup, #2. I also don't know if it's worth the effort for that > specific > > > > > feature. It might be easier to add something like per-namespace > > and/or > > > > > per-table encryption, then define common access patterns for > > > applications > > > > > that want to use multiple keys for encryption. > > > > > > > > > > > > > > > > > > > > On Wed, Nov 4, 2015 at 8:10 PM, Adam Fuchs<afu...@apache.org> > > wrote: > > > > > > > > > >> Bill, > > > > >> > > > > >> Do you envision one of the following as the driver behind > > > finer-grained > > > > >> encryption?: > > > > >> > > > > >> 1. We would only encrypt certain columns in order to get better > > > > >> performance; > > > > >> > > > > >> 2. We would use different keys on different columns in order to > > revoke > > > > >> access to a column via the key store; > > > > >> > > > > >> 3. We would only give a tablet server access to a subset of > columns > > at > > > > any > > > > >> given time in order to protect something, and figure out what to > do > > > for > > > > >> compactions, etc.; > > > > >> > > > > >> 4. Something entirely different... > > > > >> > > > > >> Seems like thing #2 might have merit, but I'm not sure it's worth > > the > > > > >> effort. > > > > >> > > > > >> Adam > > > > >> On Nov 4, 2015 7:38 PM, "William Slacum"<wsla...@gmail.com> > wrote: > > > > >> > > > > >>> @Adam, column family level encryption can be useful for > > multi-tenant > > > > >>> environments, and I think it maps pretty well to the document > > > > >>> partitioning/sharding/wikisearch style tables. Things are > trickier > > in > > > > >>> Accumulo than in HBase since there isn't a 1:1 mapping between > > column > > > > >>> families and files. The built in RFile encryption scheme seems > > better > > > > >>> suited to this. > > > > >>> > > > > >>> @Christopher& Keith, it's something we can evaluate. Is there a > > good > > > > >> test > > > > >>> harness for just writing an RFile, opening a reader to it, and > just > > > > >> poking > > > > >>> around? I was looking at the constructors and they didn't seem > > > > >>> straightforward enough for me to comprehend them within a few > > > seconds. > > > > >>> > > > > >>> > > > > >>> > > > > >>> On Tue, Nov 3, 2015 at 9:56 PM, Keith Turner<ke...@deenlo.com > > > > >>> <javascript:_e(%7B%7D,'cvml','ke...@deenlo.com');>> wrote: > > > > >>> > > > > >>>> On Mon, Nov 2, 2015 at 1:37 PM, Keith Turner<ke...@deenlo.com > > > > >>>> <javascript:_e(%7B%7D,'cvml','ke...@deenlo.com');>> wrote: > > > > >>>> > > > > >>>>> > > > > >>>>> On Mon, Nov 2, 2015 at 12:27 PM, William Slacum< > > wsla...@gmail.com > > > > >>>> <javascript:_e(%7B%7D,'cvml','wsla...@gmail.com');>> wrote: > > > > >>>>>> Is "the code being 'at rest'" you making a funny about active > > > > >>>> development? > > > > >>>>>> Making sure I haven't lost my ability to get jokes :) > > > > >>>>>> > > > > >>>>>> I see two reasons why the code would be inactive: the feature > is > > > > >> good > > > > >>>>>> enough as is or it's not interesting enough to attract > > attention. > > > > >>>>>> Considering it's not public API, there are no discussions to > > bring > > > > >>> into > > > > >>>>>> the > > > > >>>>>> public API, and there's no effort to document how to use it, > my > > > > >>>> intuition > > > > >>>>>> tells me that there isn't enough interest in it from a project > > > > >>>>>> perspective. > > > > >>>>>> > > > > >>>>>> From a user perspective, I've been getting asked about it > when > > I > > > > >> work > > > > >>>> with > > > > >>>>>> Accumulo users. My recommendation, exclusively, is to use HDFS > > > > >>>> encryption > > > > >>>>>> because I can go to Hadoop's website and find documentation on > > it. > > > > >>> When > > > > >>>> I > > > > >>>>>> go to find documentation on Accumulo's offerings, any > usability > > > > >>>>>> information > > > > >>>>>> comes from vendor SlideShares. Most mentions of the feature on > > > > >>> official > > > > >>>>>> Apache Accumulo channels echo Christopher's sentiments on the > > > > >> feature > > > > >>>>>> being > > > > >>>>>> experimental and not being officially recommended for use. > > > > >>>>>> > > > > >>>>>> I wouldn't want to rip out the feature first and then figure > > > things > > > > >>> out > > > > >>>>>> later. Sean already alluded to it, but a roadmap should > contain > > > > >>>> something > > > > >>>>>> (tool or documentation) to help users migrate if we go down > that > > > > >>> route. > > > > >>>>>> What I'm trying to figure out is, when the question of "How > do I > > > do > > > > >>>>>> encryption at rest in Accumulo?" comes up, what is our > > community's > > > > >>>> answer? > > > > >>>>>> If we went down the route of using HDFS encryption zones, can > we > > > > >> offer > > > > >>>> the > > > > >>>>>> same features? At the very least, we'd be offering the same > > > > >>>> database-level > > > > >>>>> Where does the decryption happen with DFS, is it in the DFS > > client? > > > > >> If > > > > >>>>> so, using HDFS level encryption seems to offer the same > > > > >>> functionality??? > > > > >>>>> Has anyone written a tool that takes an > > > > >>>>> Accumulo-encrypted-HDFS-unencrypted-RFile and rewrites it is as > > an > > > > >>>>> Accumulo-unencrypted-HDFS-encrypted-RFile? Wondering if there > > are > > > > >> any > > > > >>>>> unexpected gotchas w/ this. > > > > >>>>> > > > > >>>> I was discussing my questions w/ Christopher today and he > > mentioned > > > an > > > > >>>> experiment that I thought was interesting. What is the random > > seek > > > > >>>> performance of Accumulo-encrypted-HDFS-unencrypted-RFile vs > > > > >>>> Accumulo-unencrypted-HDFS-encrypted-RFile? > > > > >>>> > > > > >>>> > > > > >>>>> > > > > >>>>> > > > > >>>>>> encryption scheme. I don't know the details of "more advanced > > key > > > > >>>> stores", > > > > >>>>>> but it seems like we could potentially take any custom > > > > >> implementation > > > > >>>> and > > > > >>>>>> map it to a KeyProvider [1]. I could also envision table level > > > > >>>> encryption > > > > >>>>>> being implementable via zones, but probably not down to the > > column > > > > >>>> family > > > > >>>>>> level. > > > > >>>>>> > > > > >>>>>> [1] > > > > >>>>>> > > > > >>>>>> > > > > >> > > > > > > > > > > https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/crypto/key/KeyProvider.html > > > > >>>>>> > > > > >>>>>> On Sun, Nov 1, 2015 at 10:19 AM, Adam Fuchs<afu...@apache.org > > > > >>>> <javascript:_e(%7B%7D,'cvml','afu...@apache.org');>> wrote: > > > > >>>>>>> Responses inline. > > > > >>>>>>> > > > > >>>>>>> Adam > > > > >>>>>>> > > > > >>>>>>> On Nov 1, 2015 9:58 AM, "Christopher"<ctubb...@apache.org > > > > >>>> <javascript:_e(%7B%7D,'cvml','ctubb...@apache.org');>> wrote: > > > > >>>>>>>> 1. I'm not sure I'd call an incomplete solution 'great'. > What > > it > > > > >>>> does > > > > >>>>>> is > > > > >>>>>>>> provide partial encryption-at-rest protection (unless you're > > > > >>> running > > > > >>>>>>>> without walogs, and have good integration with some external > > > > >>> secure > > > > >>>>>> key > > > > >>>>>>>> management faculty, and then it's probably fine). > > > > >>>>>>> The only thing that doesn't get encrypted is a temporary WAL > > > > >>> recovery > > > > >>>>>> file. > > > > >>>>>>> That is a project we should take on, but it does not imply > that > > > > >> the > > > > >>>>>>> existing features are not valuable. With HDFS encryption > > options > > > > >>> this > > > > >>>>>> would > > > > >>>>>>> now be a much easier project to take on. Also, the users I > know > > > > >> that > > > > >>>> use > > > > >>>>>>> encryption at rest do so with a more secure key store than > the > > > > >>>> default. > > > > >>>>>>>> 2. I'm concerned that anybody using Accumulo's E-A-R don't > > > > >>>> necessarily > > > > >>>>>>>> realize its current shortcomings, or its lack of upstream > > > > >>>> maintenance > > > > >>>>>>>> support (which it has not been receiving). It may be the > case > > > > >> that > > > > >>>>>> these > > > > >>>>>>>> users have support from an intermediary, and do understand > the > > > > >>>>>>>> shortcomings... I don't know, but it's a concern. > > > > >>>>>>> Anybody that creates a secure system has to analyze the > > security > > > > >> of > > > > >>>> the > > > > >>>>>>> system as a whole. Accumulo's encryption at rest is one part > of > > > > >> the > > > > >>>>>>> solution. Taking away the tool without providing an > alternative > > > > >> does > > > > >>>>>>> nothing to improve the security of systems built on Accumulo. > > > > >>>>>>> > > > > >>>>>>>> 3. Correction: it has been an explicitly experimental > feature > > > > >> and > > > > >>> an > > > > >>>>>>>> incomplete one, which hasn't really been touched in two > years, > > > > >> and > > > > >>>> has > > > > >>>>>>> been > > > > >>>>>>>> explicitly excluded by the community for being public API > > > > >> because > > > > >>> of > > > > >>>>>> its > > > > >>>>>>>> incompleteness. Age doesn't determine public API status. The > > > > >>>> community > > > > >>>>>>> does. > > > > >>>>>>> > > > > >>>>>>> People are using it, so we have to consider the implications > of > > > > >>>> whatever > > > > >>>>>>> changes we make and weigh against the benefits. I believe the > > > last > > > > >>> bug > > > > >>>>>> fix > > > > >>>>>>> was done this year, so I would argue it is being maintained. > > > > >> Changes > > > > >>>> to > > > > >>>>>> our > > > > >>>>>>> encryption at rest implementation will have consequences for > > > those > > > > >>>>>> users. > > > > >>>>>>> There had better be a clear benefit if we break their > systems. > > > > >>>>>>> > > > > >>>>>>>> 4. Has Accumulo's been evaluated for security and > performance? > > > > >> By > > > > >>>>>> whom? > > > > >>>>>>> Is > > > > >>>>>>>> it published? > > > > >>>>>>> Yes, there have been several talks at meetups and conferences > > > that > > > > >>>>>> discuss > > > > >>>>>>> the security and performance of the current solution. > > > > >>>>>>> > > > > >>>>>>>> On Sun, Nov 1, 2015, 08:55 Adam Fuchs<afu...@apache.org > > > > >>>> <javascript:_e(%7B%7D,'cvml','afu...@apache.org');>> wrote: > > > > >>>>>>>>> There's another way to look at the state of Accumulo's > > > > >>> encryption > > > > >>>> at > > > > >>>>>>> rest: > > > > >>>>>>>>> 1. Encryption at rest works great for what it does, and the > > > > >> code > > > > >>>>>> being > > > > >>>>>>> "at > > > > >>>>>>>>> rest" isn't necessarily a problem > > > > >>>>>>>>> 2. Several organizations are using Accumulo's encryption at > > > > >> rest > > > > >>>>>>>>> effectively in operations > > > > >>>>>>>>> 3. Encryption at rest has been a supported configuration > > > > >> option > > > > >>>> for > > > > >>>>>>> over > > > > >>>>>>>>> two years with established plugin interfaces, and therefore > > it > > > > >>>>>> should > > > > >>>>>>> be > > > > >>>>>>>>> considered part of the public API > > > > >>>>>>>>> 4. Upstream alternatives (to my knowledge) have not been > > > > >>> analyzed > > > > >>>>>> for > > > > >>>>>>>>> performance or security > > > > >>>>>>>>> > > > > >>>>>>>>> The given option #2 would at least require an analysis of > > > > >>>>>> alternatives, > > > > >>>>>>> and > > > > >>>>>>>>> we would have to decide what to do about backwards > > > > >> compatibility > > > > >>>> for > > > > >>>>>>> users > > > > >>>>>>>>> using custom key stores and encryption strategies that may > or > > > > >>> may > > > > >>>>>> not > > > > >>>>>>> be > > > > >>>>>>>>> supported by upstream alternatives. > > > > >>>>>>>>> > > > > >>>>>>>>> As far as option #1 goes, I can get behind encouraging > people > > > > >> to > > > > >>>>>> take > > > > >>>>>>> up > > > > >>>>>>>>> projects to improve Accumulo's encryption. I think we're > > > > >> already > > > > >>>>>> going > > > > >>>>>>> down > > > > >>>>>>>>> this path, but without having identified resources to do > the > > > > >>>>>>> improvements. > > > > >>>>>>>>> Any volunteers? > > > > >>>>>>>>> > > > > >>>>>>>>> Adam > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>>> On Fri, Oct 30, 2015 at 4:22 PM, William Slacum< > > > > >>>> wsla...@gmail.com<javascript:_e(%7B%7D,'cvml',' > wsla...@gmail.com > > > ');>> > > > > >>>>>>> wrote: > > > > >>>>>>>>>> So I've been looking into options for providing encryption > > > > >> at > > > > >>>>>> rest, > > > > >>>>>>> and > > > > >>>>>>>>> it > > > > >>>>>>>>>> seems like what Accumulo has is abandonware from a project > > > > >>>>>>> perspective. > > > > >>>>>>>>>> There is no official documentation on how to perform > > > > >>> encryption > > > > >>>> at > > > > >>>>>>> rest, > > > > >>>>>>>>>> and the best information from its status comes from year > (or > > > > >>>>>> greater) > > > > >>>>>>> old > > > > >>>>>>>>>> ticket comments about how the feature is still > experimental. > > > > >>>>>> Recently > > > > >>>>>>>>> there > > > > >>>>>>>>>> was a talk that described using HDFS encryption zones as > an > > > > >>>>>>> alternative. > > > > >>>>>>>>>> From my perspective, this is what I see as the current > > > > >>>> situation: > > > > >>>>>>>>>> 1- Encryption at rest in Accumulo isn't actively being > > > > >> worked > > > > >>> on > > > > >>>>>>>>>> 2- Encryption at rest in Accumulo isn't part of the public > > > > >> API > > > > >>>> or > > > > >>>>>>>>> marketed > > > > >>>>>>>>>> capabilities > > > > >>>>>>>>>> 3- Documentation for what does exist is scattered > throughout > > > > >>>> Jira > > > > >>>>>>>>> comments > > > > >>>>>>>>>> or presentations > > > > >>>>>>>>>> 4- A viable alternative exists that appears to have > feature > > > > >>>>>> parity in > > > > >>>>>>>>> HDFS > > > > >>>>>>>>>> encryption > > > > >>>>>>>>>> 5- HBase has finer grained encryption capabilities that > > > > >> extend > > > > >>>>>> beyond > > > > >>>>>>>>> what > > > > >>>>>>>>>> HDFS provides > > > > >>>>>>>>>> > > > > >>>>>>>>>> Moving forward, what's the consensus for supporting this > > > > >>>> feature? > > > > >>>>>>>>>> Personally, I see two options: > > > > >>>>>>>>>> > > > > >>>>>>>>>> 1- Start going down a path to bring the feature into the > > > > >>>> forefront > > > > >>>>>>> and > > > > >>>>>>>>>> start providing feature parity with HBase > > > > >>>>>>>>>> > > > > >>>>>>>>>> or > > > > >>>>>>>>>> > > > > >>>>>>>>>> 2- Remove the feature and place emphasis on upstream > > > > >>> encryption > > > > >>>>>>> offerings > > > > >>>>>>>>>> Any input is welcomed& appreciated! > > > > >>>>>>>>>> > > > > >>>>> > > > > > > > > > > > > > > >