Perhaps. I had interpreted some of Adam's comments ("The only thing that
doesn't get encrypted is a temporary WAL recovery file. That is a project
we should take on..."), as favoring improvements to the current state of
things. As that has also been the focus of previous conversations about the
state of Accumulo's encryption-at-rest, I assumed that third camp also
existed. Perhaps I was wrong.On Thu, Nov 5, 2015 at 1:11 PM Mike Drob <[email protected]> wrote: > I think you have misidentified the two camps. There is a camp that believes > we should phase out the code in favour of the HDFS encryption, and a camp > that believes the code is sufficiently mature. I don't think there is a > group that is interested in improving the state of things. > > On Thu, Nov 5, 2015 at 12:02 PM, Christopher <[email protected]> wrote: > > > JIRAs are fine, but I thought this thread was mostly addressing the fact > > that there doesn't seem to be a sustained interest in actually working on > > any of the JIRAs addressing that area of code. Am I wrong? Is there > > willingness from anybody to expend effort on this code? Even if not, we > can > > still make JIRAs, but they'll probably just be ignored. So, the question > > for me is: which JIRAs should we make? Are we going to pursue phasing out > > the code, or pursue improving it? Those are very different JIRA text. > > > > On Thu, Nov 5, 2015 at 12:22 PM Mike Drob <[email protected]> wrote: > > > > > Can we file some JIRAs to build out a suite to test this and run the > > > necessary tests? > > > > > > On Thu, Nov 5, 2015 at 11:17 AM, Christopher <[email protected]> > > wrote: > > > > > > > My main concern using HDFS encryption vs. built-in Accumulo > > > implementation > > > > is possibly performance with respect to seeks. If we encrypt our > > indexed > > > > blocks independently (as we do now), I suspect our seeks would be > more > > > > performant than relying on HDFS encryption, whose encrypted blocks > may > > > not > > > > fall on our index boundaries. If this is a small difference, it might > > > still > > > > be worth it for convenience and simpler maintenance, but I suspect > the > > > > difference will be somewhat substantial. > > > > > > > > On Thu, Nov 5, 2015 at 12:11 PM Josh Elser <[email protected]> > > wrote: > > > > > > > > > +1 I think this is the right step. My hunch is that some of the > > common > > > > > data access patterns that we have in Accumulo (over HBase) is that > > the > > > > > per-colfam encryption isn't quick as common a design pattern as it > is > > > > > for HBase (please tell me I'm wrong if anyone disagrees -- this is > > > > > mostly a gut reaction). I think our users would likely benefit more > > > from > > > > > a per-namespace/table encryption control like you suggest. > > > > > > > > > > Implementing RFile encryption at HDFS level (e.g. tie a specific > > > > > zone/key for a table) is probably straightforward. Changing the > > > > > TServer's WAL use would likely be trickier to get right (a tserver > > > would > > > > > have multiple WALs, one for each unique zone/key from Tablet it > > happens > > > > > to host). Maybe worrying about that is getting ahead of things -- > > just > > > > > thought about it and figured I'd mention it :) > > > > > > > > > > William Slacum wrote: > > > > > > Yup, #2. I also don't know if it's worth the effort for that > > specific > > > > > > feature. It might be easier to add something like per-namespace > > > and/or > > > > > > per-table encryption, then define common access patterns for > > > > applications > > > > > > that want to use multiple keys for encryption. > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Nov 4, 2015 at 8:10 PM, Adam Fuchs<[email protected]> > > > wrote: > > > > > > > > > > > >> Bill, > > > > > >> > > > > > >> Do you envision one of the following as the driver behind > > > > finer-grained > > > > > >> encryption?: > > > > > >> > > > > > >> 1. We would only encrypt certain columns in order to get better > > > > > >> performance; > > > > > >> > > > > > >> 2. We would use different keys on different columns in order to > > > revoke > > > > > >> access to a column via the key store; > > > > > >> > > > > > >> 3. We would only give a tablet server access to a subset of > > columns > > > at > > > > > any > > > > > >> given time in order to protect something, and figure out what to > > do > > > > for > > > > > >> compactions, etc.; > > > > > >> > > > > > >> 4. Something entirely different... > > > > > >> > > > > > >> Seems like thing #2 might have merit, but I'm not sure it's > worth > > > the > > > > > >> effort. > > > > > >> > > > > > >> Adam > > > > > >> On Nov 4, 2015 7:38 PM, "William Slacum"<[email protected]> > > wrote: > > > > > >> > > > > > >>> @Adam, column family level encryption can be useful for > > > multi-tenant > > > > > >>> environments, and I think it maps pretty well to the document > > > > > >>> partitioning/sharding/wikisearch style tables. Things are > > trickier > > > in > > > > > >>> Accumulo than in HBase since there isn't a 1:1 mapping between > > > column > > > > > >>> families and files. The built in RFile encryption scheme seems > > > better > > > > > >>> suited to this. > > > > > >>> > > > > > >>> @Christopher& Keith, it's something we can evaluate. Is there > a > > > good > > > > > >> test > > > > > >>> harness for just writing an RFile, opening a reader to it, and > > just > > > > > >> poking > > > > > >>> around? I was looking at the constructors and they didn't seem > > > > > >>> straightforward enough for me to comprehend them within a few > > > > seconds. > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> On Tue, Nov 3, 2015 at 9:56 PM, Keith Turner<[email protected] > > > > > >>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > > > > > >>> > > > > > >>>> On Mon, Nov 2, 2015 at 1:37 PM, Keith Turner<[email protected] > > > > > >>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > > > > > >>>> > > > > > >>>>> > > > > > >>>>> On Mon, Nov 2, 2015 at 12:27 PM, William Slacum< > > > [email protected] > > > > > >>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > > > > > >>>>>> Is "the code being 'at rest'" you making a funny about > active > > > > > >>>> development? > > > > > >>>>>> Making sure I haven't lost my ability to get jokes :) > > > > > >>>>>> > > > > > >>>>>> I see two reasons why the code would be inactive: the > feature > > is > > > > > >> good > > > > > >>>>>> enough as is or it's not interesting enough to attract > > > attention. > > > > > >>>>>> Considering it's not public API, there are no discussions to > > > bring > > > > > >>> into > > > > > >>>>>> the > > > > > >>>>>> public API, and there's no effort to document how to use it, > > my > > > > > >>>> intuition > > > > > >>>>>> tells me that there isn't enough interest in it from a > project > > > > > >>>>>> perspective. > > > > > >>>>>> > > > > > >>>>>> From a user perspective, I've been getting asked about it > > when > > > I > > > > > >> work > > > > > >>>> with > > > > > >>>>>> Accumulo users. My recommendation, exclusively, is to use > HDFS > > > > > >>>> encryption > > > > > >>>>>> because I can go to Hadoop's website and find documentation > on > > > it. > > > > > >>> When > > > > > >>>> I > > > > > >>>>>> go to find documentation on Accumulo's offerings, any > > usability > > > > > >>>>>> information > > > > > >>>>>> comes from vendor SlideShares. Most mentions of the feature > on > > > > > >>> official > > > > > >>>>>> Apache Accumulo channels echo Christopher's sentiments on > the > > > > > >> feature > > > > > >>>>>> being > > > > > >>>>>> experimental and not being officially recommended for use. > > > > > >>>>>> > > > > > >>>>>> I wouldn't want to rip out the feature first and then figure > > > > things > > > > > >>> out > > > > > >>>>>> later. Sean already alluded to it, but a roadmap should > > contain > > > > > >>>> something > > > > > >>>>>> (tool or documentation) to help users migrate if we go down > > that > > > > > >>> route. > > > > > >>>>>> What I'm trying to figure out is, when the question of "How > > do I > > > > do > > > > > >>>>>> encryption at rest in Accumulo?" comes up, what is our > > > community's > > > > > >>>> answer? > > > > > >>>>>> If we went down the route of using HDFS encryption zones, > can > > we > > > > > >> offer > > > > > >>>> the > > > > > >>>>>> same features? At the very least, we'd be offering the same > > > > > >>>> database-level > > > > > >>>>> Where does the decryption happen with DFS, is it in the DFS > > > client? > > > > > >> If > > > > > >>>>> so, using HDFS level encryption seems to offer the same > > > > > >>> functionality??? > > > > > >>>>> Has anyone written a tool that takes an > > > > > >>>>> Accumulo-encrypted-HDFS-unencrypted-RFile and rewrites it is > as > > > an > > > > > >>>>> Accumulo-unencrypted-HDFS-encrypted-RFile? Wondering if > there > > > are > > > > > >> any > > > > > >>>>> unexpected gotchas w/ this. > > > > > >>>>> > > > > > >>>> I was discussing my questions w/ Christopher today and he > > > mentioned > > > > an > > > > > >>>> experiment that I thought was interesting. What is the > random > > > seek > > > > > >>>> performance of Accumulo-encrypted-HDFS-unencrypted-RFile vs > > > > > >>>> Accumulo-unencrypted-HDFS-encrypted-RFile? > > > > > >>>> > > > > > >>>> > > > > > >>>>> > > > > > >>>>> > > > > > >>>>>> encryption scheme. I don't know the details of "more > advanced > > > key > > > > > >>>> stores", > > > > > >>>>>> but it seems like we could potentially take any custom > > > > > >> implementation > > > > > >>>> and > > > > > >>>>>> map it to a KeyProvider [1]. I could also envision table > level > > > > > >>>> encryption > > > > > >>>>>> being implementable via zones, but probably not down to the > > > column > > > > > >>>> family > > > > > >>>>>> level. > > > > > >>>>>> > > > > > >>>>>> [1] > > > > > >>>>>> > > > > > >>>>>> > > > > > >> > > > > > > > > > > > > > > > https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/crypto/key/KeyProvider.html > > > > > >>>>>> > > > > > >>>>>> On Sun, Nov 1, 2015 at 10:19 AM, Adam Fuchs< > [email protected] > > > > > >>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > > > > > >>>>>>> Responses inline. > > > > > >>>>>>> > > > > > >>>>>>> Adam > > > > > >>>>>>> > > > > > >>>>>>> On Nov 1, 2015 9:58 AM, "Christopher"<[email protected] > > > > > >>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> > wrote: > > > > > >>>>>>>> 1. I'm not sure I'd call an incomplete solution 'great'. > > What > > > it > > > > > >>>> does > > > > > >>>>>> is > > > > > >>>>>>>> provide partial encryption-at-rest protection (unless > you're > > > > > >>> running > > > > > >>>>>>>> without walogs, and have good integration with some > external > > > > > >>> secure > > > > > >>>>>> key > > > > > >>>>>>>> management faculty, and then it's probably fine). > > > > > >>>>>>> The only thing that doesn't get encrypted is a temporary > WAL > > > > > >>> recovery > > > > > >>>>>> file. > > > > > >>>>>>> That is a project we should take on, but it does not imply > > that > > > > > >> the > > > > > >>>>>>> existing features are not valuable. With HDFS encryption > > > options > > > > > >>> this > > > > > >>>>>> would > > > > > >>>>>>> now be a much easier project to take on. Also, the users I > > know > > > > > >> that > > > > > >>>> use > > > > > >>>>>>> encryption at rest do so with a more secure key store than > > the > > > > > >>>> default. > > > > > >>>>>>>> 2. I'm concerned that anybody using Accumulo's E-A-R don't > > > > > >>>> necessarily > > > > > >>>>>>>> realize its current shortcomings, or its lack of upstream > > > > > >>>> maintenance > > > > > >>>>>>>> support (which it has not been receiving). It may be the > > case > > > > > >> that > > > > > >>>>>> these > > > > > >>>>>>>> users have support from an intermediary, and do understand > > the > > > > > >>>>>>>> shortcomings... I don't know, but it's a concern. > > > > > >>>>>>> Anybody that creates a secure system has to analyze the > > > security > > > > > >> of > > > > > >>>> the > > > > > >>>>>>> system as a whole. Accumulo's encryption at rest is one > part > > of > > > > > >> the > > > > > >>>>>>> solution. Taking away the tool without providing an > > alternative > > > > > >> does > > > > > >>>>>>> nothing to improve the security of systems built on > Accumulo. > > > > > >>>>>>> > > > > > >>>>>>>> 3. Correction: it has been an explicitly experimental > > feature > > > > > >> and > > > > > >>> an > > > > > >>>>>>>> incomplete one, which hasn't really been touched in two > > years, > > > > > >> and > > > > > >>>> has > > > > > >>>>>>> been > > > > > >>>>>>>> explicitly excluded by the community for being public API > > > > > >> because > > > > > >>> of > > > > > >>>>>> its > > > > > >>>>>>>> incompleteness. Age doesn't determine public API status. > The > > > > > >>>> community > > > > > >>>>>>> does. > > > > > >>>>>>> > > > > > >>>>>>> People are using it, so we have to consider the > implications > > of > > > > > >>>> whatever > > > > > >>>>>>> changes we make and weigh against the benefits. I believe > the > > > > last > > > > > >>> bug > > > > > >>>>>> fix > > > > > >>>>>>> was done this year, so I would argue it is being > maintained. > > > > > >> Changes > > > > > >>>> to > > > > > >>>>>> our > > > > > >>>>>>> encryption at rest implementation will have consequences > for > > > > those > > > > > >>>>>> users. > > > > > >>>>>>> There had better be a clear benefit if we break their > > systems. > > > > > >>>>>>> > > > > > >>>>>>>> 4. Has Accumulo's been evaluated for security and > > performance? > > > > > >> By > > > > > >>>>>> whom? > > > > > >>>>>>> Is > > > > > >>>>>>>> it published? > > > > > >>>>>>> Yes, there have been several talks at meetups and > conferences > > > > that > > > > > >>>>>> discuss > > > > > >>>>>>> the security and performance of the current solution. > > > > > >>>>>>> > > > > > >>>>>>>> On Sun, Nov 1, 2015, 08:55 Adam Fuchs<[email protected] > > > > > >>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > > > > > >>>>>>>>> There's another way to look at the state of Accumulo's > > > > > >>> encryption > > > > > >>>> at > > > > > >>>>>>> rest: > > > > > >>>>>>>>> 1. Encryption at rest works great for what it does, and > the > > > > > >> code > > > > > >>>>>> being > > > > > >>>>>>> "at > > > > > >>>>>>>>> rest" isn't necessarily a problem > > > > > >>>>>>>>> 2. Several organizations are using Accumulo's encryption > at > > > > > >> rest > > > > > >>>>>>>>> effectively in operations > > > > > >>>>>>>>> 3. Encryption at rest has been a supported configuration > > > > > >> option > > > > > >>>> for > > > > > >>>>>>> over > > > > > >>>>>>>>> two years with established plugin interfaces, and > therefore > > > it > > > > > >>>>>> should > > > > > >>>>>>> be > > > > > >>>>>>>>> considered part of the public API > > > > > >>>>>>>>> 4. Upstream alternatives (to my knowledge) have not been > > > > > >>> analyzed > > > > > >>>>>> for > > > > > >>>>>>>>> performance or security > > > > > >>>>>>>>> > > > > > >>>>>>>>> The given option #2 would at least require an analysis of > > > > > >>>>>> alternatives, > > > > > >>>>>>> and > > > > > >>>>>>>>> we would have to decide what to do about backwards > > > > > >> compatibility > > > > > >>>> for > > > > > >>>>>>> users > > > > > >>>>>>>>> using custom key stores and encryption strategies that > may > > or > > > > > >>> may > > > > > >>>>>> not > > > > > >>>>>>> be > > > > > >>>>>>>>> supported by upstream alternatives. > > > > > >>>>>>>>> > > > > > >>>>>>>>> As far as option #1 goes, I can get behind encouraging > > people > > > > > >> to > > > > > >>>>>> take > > > > > >>>>>>> up > > > > > >>>>>>>>> projects to improve Accumulo's encryption. I think we're > > > > > >> already > > > > > >>>>>> going > > > > > >>>>>>> down > > > > > >>>>>>>>> this path, but without having identified resources to do > > the > > > > > >>>>>>> improvements. > > > > > >>>>>>>>> Any volunteers? > > > > > >>>>>>>>> > > > > > >>>>>>>>> Adam > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > >>>>>>>>> On Fri, Oct 30, 2015 at 4:22 PM, William Slacum< > > > > > >>>> [email protected]<javascript:_e(%7B%7D,'cvml',' > > [email protected] > > > > ');>> > > > > > >>>>>>> wrote: > > > > > >>>>>>>>>> So I've been looking into options for providing > encryption > > > > > >> at > > > > > >>>>>> rest, > > > > > >>>>>>> and > > > > > >>>>>>>>> it > > > > > >>>>>>>>>> seems like what Accumulo has is abandonware from a > project > > > > > >>>>>>> perspective. > > > > > >>>>>>>>>> There is no official documentation on how to perform > > > > > >>> encryption > > > > > >>>> at > > > > > >>>>>>> rest, > > > > > >>>>>>>>>> and the best information from its status comes from year > > (or > > > > > >>>>>> greater) > > > > > >>>>>>> old > > > > > >>>>>>>>>> ticket comments about how the feature is still > > experimental. > > > > > >>>>>> Recently > > > > > >>>>>>>>> there > > > > > >>>>>>>>>> was a talk that described using HDFS encryption zones as > > an > > > > > >>>>>>> alternative. > > > > > >>>>>>>>>> From my perspective, this is what I see as the current > > > > > >>>> situation: > > > > > >>>>>>>>>> 1- Encryption at rest in Accumulo isn't actively being > > > > > >> worked > > > > > >>> on > > > > > >>>>>>>>>> 2- Encryption at rest in Accumulo isn't part of the > public > > > > > >> API > > > > > >>>> or > > > > > >>>>>>>>> marketed > > > > > >>>>>>>>>> capabilities > > > > > >>>>>>>>>> 3- Documentation for what does exist is scattered > > throughout > > > > > >>>> Jira > > > > > >>>>>>>>> comments > > > > > >>>>>>>>>> or presentations > > > > > >>>>>>>>>> 4- A viable alternative exists that appears to have > > feature > > > > > >>>>>> parity in > > > > > >>>>>>>>> HDFS > > > > > >>>>>>>>>> encryption > > > > > >>>>>>>>>> 5- HBase has finer grained encryption capabilities that > > > > > >> extend > > > > > >>>>>> beyond > > > > > >>>>>>>>> what > > > > > >>>>>>>>>> HDFS provides > > > > > >>>>>>>>>> > > > > > >>>>>>>>>> Moving forward, what's the consensus for supporting this > > > > > >>>> feature? > > > > > >>>>>>>>>> Personally, I see two options: > > > > > >>>>>>>>>> > > > > > >>>>>>>>>> 1- Start going down a path to bring the feature into the > > > > > >>>> forefront > > > > > >>>>>>> and > > > > > >>>>>>>>>> start providing feature parity with HBase > > > > > >>>>>>>>>> > > > > > >>>>>>>>>> or > > > > > >>>>>>>>>> > > > > > >>>>>>>>>> 2- Remove the feature and place emphasis on upstream > > > > > >>> encryption > > > > > >>>>>>> offerings > > > > > >>>>>>>>>> Any input is welcomed& appreciated! > > > > > >>>>>>>>>> > > > > > >>>>> > > > > > > > > > > > > > > > > > > > > >
