My main concern using HDFS encryption vs. built-in Accumulo implementation is possibly performance with respect to seeks. If we encrypt our indexed blocks independently (as we do now), I suspect our seeks would be more performant than relying on HDFS encryption, whose encrypted blocks may not fall on our index boundaries. If this is a small difference, it might still be worth it for convenience and simpler maintenance, but I suspect the difference will be somewhat substantial.
On Thu, Nov 5, 2015 at 12:11 PM Josh Elser <[email protected]> wrote: > +1 I think this is the right step. My hunch is that some of the common > data access patterns that we have in Accumulo (over HBase) is that the > per-colfam encryption isn't quick as common a design pattern as it is > for HBase (please tell me I'm wrong if anyone disagrees -- this is > mostly a gut reaction). I think our users would likely benefit more from > a per-namespace/table encryption control like you suggest. > > Implementing RFile encryption at HDFS level (e.g. tie a specific > zone/key for a table) is probably straightforward. Changing the > TServer's WAL use would likely be trickier to get right (a tserver would > have multiple WALs, one for each unique zone/key from Tablet it happens > to host). Maybe worrying about that is getting ahead of things -- just > thought about it and figured I'd mention it :) > > William Slacum wrote: > > Yup, #2. I also don't know if it's worth the effort for that specific > > feature. It might be easier to add something like per-namespace and/or > > per-table encryption, then define common access patterns for applications > > that want to use multiple keys for encryption. > > > > > > > > On Wed, Nov 4, 2015 at 8:10 PM, Adam Fuchs<[email protected]> wrote: > > > >> Bill, > >> > >> Do you envision one of the following as the driver behind finer-grained > >> encryption?: > >> > >> 1. We would only encrypt certain columns in order to get better > >> performance; > >> > >> 2. We would use different keys on different columns in order to revoke > >> access to a column via the key store; > >> > >> 3. We would only give a tablet server access to a subset of columns at > any > >> given time in order to protect something, and figure out what to do for > >> compactions, etc.; > >> > >> 4. Something entirely different... > >> > >> Seems like thing #2 might have merit, but I'm not sure it's worth the > >> effort. > >> > >> Adam > >> On Nov 4, 2015 7:38 PM, "William Slacum"<[email protected]> wrote: > >> > >>> @Adam, column family level encryption can be useful for multi-tenant > >>> environments, and I think it maps pretty well to the document > >>> partitioning/sharding/wikisearch style tables. Things are trickier in > >>> Accumulo than in HBase since there isn't a 1:1 mapping between column > >>> families and files. The built in RFile encryption scheme seems better > >>> suited to this. > >>> > >>> @Christopher& Keith, it's something we can evaluate. Is there a good > >> test > >>> harness for just writing an RFile, opening a reader to it, and just > >> poking > >>> around? I was looking at the constructors and they didn't seem > >>> straightforward enough for me to comprehend them within a few seconds. > >>> > >>> > >>> > >>> On Tue, Nov 3, 2015 at 9:56 PM, Keith Turner<[email protected] > >>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >>> > >>>> On Mon, Nov 2, 2015 at 1:37 PM, Keith Turner<[email protected] > >>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >>>> > >>>>> > >>>>> On Mon, Nov 2, 2015 at 12:27 PM, William Slacum<[email protected] > >>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >>>>>> Is "the code being 'at rest'" you making a funny about active > >>>> development? > >>>>>> Making sure I haven't lost my ability to get jokes :) > >>>>>> > >>>>>> I see two reasons why the code would be inactive: the feature is > >> good > >>>>>> enough as is or it's not interesting enough to attract attention. > >>>>>> Considering it's not public API, there are no discussions to bring > >>> into > >>>>>> the > >>>>>> public API, and there's no effort to document how to use it, my > >>>> intuition > >>>>>> tells me that there isn't enough interest in it from a project > >>>>>> perspective. > >>>>>> > >>>>>> From a user perspective, I've been getting asked about it when I > >> work > >>>> with > >>>>>> Accumulo users. My recommendation, exclusively, is to use HDFS > >>>> encryption > >>>>>> because I can go to Hadoop's website and find documentation on it. > >>> When > >>>> I > >>>>>> go to find documentation on Accumulo's offerings, any usability > >>>>>> information > >>>>>> comes from vendor SlideShares. Most mentions of the feature on > >>> official > >>>>>> Apache Accumulo channels echo Christopher's sentiments on the > >> feature > >>>>>> being > >>>>>> experimental and not being officially recommended for use. > >>>>>> > >>>>>> I wouldn't want to rip out the feature first and then figure things > >>> out > >>>>>> later. Sean already alluded to it, but a roadmap should contain > >>>> something > >>>>>> (tool or documentation) to help users migrate if we go down that > >>> route. > >>>>>> What I'm trying to figure out is, when the question of "How do I do > >>>>>> encryption at rest in Accumulo?" comes up, what is our community's > >>>> answer? > >>>>>> If we went down the route of using HDFS encryption zones, can we > >> offer > >>>> the > >>>>>> same features? At the very least, we'd be offering the same > >>>> database-level > >>>>> Where does the decryption happen with DFS, is it in the DFS client? > >> If > >>>>> so, using HDFS level encryption seems to offer the same > >>> functionality??? > >>>>> Has anyone written a tool that takes an > >>>>> Accumulo-encrypted-HDFS-unencrypted-RFile and rewrites it is as an > >>>>> Accumulo-unencrypted-HDFS-encrypted-RFile? Wondering if there are > >> any > >>>>> unexpected gotchas w/ this. > >>>>> > >>>> I was discussing my questions w/ Christopher today and he mentioned an > >>>> experiment that I thought was interesting. What is the random seek > >>>> performance of Accumulo-encrypted-HDFS-unencrypted-RFile vs > >>>> Accumulo-unencrypted-HDFS-encrypted-RFile? > >>>> > >>>> > >>>>> > >>>>> > >>>>>> encryption scheme. I don't know the details of "more advanced key > >>>> stores", > >>>>>> but it seems like we could potentially take any custom > >> implementation > >>>> and > >>>>>> map it to a KeyProvider [1]. I could also envision table level > >>>> encryption > >>>>>> being implementable via zones, but probably not down to the column > >>>> family > >>>>>> level. > >>>>>> > >>>>>> [1] > >>>>>> > >>>>>> > >> > https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/crypto/key/KeyProvider.html > >>>>>> > >>>>>> On Sun, Nov 1, 2015 at 10:19 AM, Adam Fuchs<[email protected] > >>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >>>>>>> Responses inline. > >>>>>>> > >>>>>>> Adam > >>>>>>> > >>>>>>> On Nov 1, 2015 9:58 AM, "Christopher"<[email protected] > >>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >>>>>>>> 1. I'm not sure I'd call an incomplete solution 'great'. What it > >>>> does > >>>>>> is > >>>>>>>> provide partial encryption-at-rest protection (unless you're > >>> running > >>>>>>>> without walogs, and have good integration with some external > >>> secure > >>>>>> key > >>>>>>>> management faculty, and then it's probably fine). > >>>>>>> The only thing that doesn't get encrypted is a temporary WAL > >>> recovery > >>>>>> file. > >>>>>>> That is a project we should take on, but it does not imply that > >> the > >>>>>>> existing features are not valuable. With HDFS encryption options > >>> this > >>>>>> would > >>>>>>> now be a much easier project to take on. Also, the users I know > >> that > >>>> use > >>>>>>> encryption at rest do so with a more secure key store than the > >>>> default. > >>>>>>>> 2. I'm concerned that anybody using Accumulo's E-A-R don't > >>>> necessarily > >>>>>>>> realize its current shortcomings, or its lack of upstream > >>>> maintenance > >>>>>>>> support (which it has not been receiving). It may be the case > >> that > >>>>>> these > >>>>>>>> users have support from an intermediary, and do understand the > >>>>>>>> shortcomings... I don't know, but it's a concern. > >>>>>>> Anybody that creates a secure system has to analyze the security > >> of > >>>> the > >>>>>>> system as a whole. Accumulo's encryption at rest is one part of > >> the > >>>>>>> solution. Taking away the tool without providing an alternative > >> does > >>>>>>> nothing to improve the security of systems built on Accumulo. > >>>>>>> > >>>>>>>> 3. Correction: it has been an explicitly experimental feature > >> and > >>> an > >>>>>>>> incomplete one, which hasn't really been touched in two years, > >> and > >>>> has > >>>>>>> been > >>>>>>>> explicitly excluded by the community for being public API > >> because > >>> of > >>>>>> its > >>>>>>>> incompleteness. Age doesn't determine public API status. The > >>>> community > >>>>>>> does. > >>>>>>> > >>>>>>> People are using it, so we have to consider the implications of > >>>> whatever > >>>>>>> changes we make and weigh against the benefits. I believe the last > >>> bug > >>>>>> fix > >>>>>>> was done this year, so I would argue it is being maintained. > >> Changes > >>>> to > >>>>>> our > >>>>>>> encryption at rest implementation will have consequences for those > >>>>>> users. > >>>>>>> There had better be a clear benefit if we break their systems. > >>>>>>> > >>>>>>>> 4. Has Accumulo's been evaluated for security and performance? > >> By > >>>>>> whom? > >>>>>>> Is > >>>>>>>> it published? > >>>>>>> Yes, there have been several talks at meetups and conferences that > >>>>>> discuss > >>>>>>> the security and performance of the current solution. > >>>>>>> > >>>>>>>> On Sun, Nov 1, 2015, 08:55 Adam Fuchs<[email protected] > >>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >>>>>>>>> There's another way to look at the state of Accumulo's > >>> encryption > >>>> at > >>>>>>> rest: > >>>>>>>>> 1. Encryption at rest works great for what it does, and the > >> code > >>>>>> being > >>>>>>> "at > >>>>>>>>> rest" isn't necessarily a problem > >>>>>>>>> 2. Several organizations are using Accumulo's encryption at > >> rest > >>>>>>>>> effectively in operations > >>>>>>>>> 3. Encryption at rest has been a supported configuration > >> option > >>>> for > >>>>>>> over > >>>>>>>>> two years with established plugin interfaces, and therefore it > >>>>>> should > >>>>>>> be > >>>>>>>>> considered part of the public API > >>>>>>>>> 4. Upstream alternatives (to my knowledge) have not been > >>> analyzed > >>>>>> for > >>>>>>>>> performance or security > >>>>>>>>> > >>>>>>>>> The given option #2 would at least require an analysis of > >>>>>> alternatives, > >>>>>>> and > >>>>>>>>> we would have to decide what to do about backwards > >> compatibility > >>>> for > >>>>>>> users > >>>>>>>>> using custom key stores and encryption strategies that may or > >>> may > >>>>>> not > >>>>>>> be > >>>>>>>>> supported by upstream alternatives. > >>>>>>>>> > >>>>>>>>> As far as option #1 goes, I can get behind encouraging people > >> to > >>>>>> take > >>>>>>> up > >>>>>>>>> projects to improve Accumulo's encryption. I think we're > >> already > >>>>>> going > >>>>>>> down > >>>>>>>>> this path, but without having identified resources to do the > >>>>>>> improvements. > >>>>>>>>> Any volunteers? > >>>>>>>>> > >>>>>>>>> Adam > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Fri, Oct 30, 2015 at 4:22 PM, William Slacum< > >>>> [email protected]<javascript:_e(%7B%7D,'cvml','[email protected]');>> > >>>>>>> wrote: > >>>>>>>>>> So I've been looking into options for providing encryption > >> at > >>>>>> rest, > >>>>>>> and > >>>>>>>>> it > >>>>>>>>>> seems like what Accumulo has is abandonware from a project > >>>>>>> perspective. > >>>>>>>>>> There is no official documentation on how to perform > >>> encryption > >>>> at > >>>>>>> rest, > >>>>>>>>>> and the best information from its status comes from year (or > >>>>>> greater) > >>>>>>> old > >>>>>>>>>> ticket comments about how the feature is still experimental. > >>>>>> Recently > >>>>>>>>> there > >>>>>>>>>> was a talk that described using HDFS encryption zones as an > >>>>>>> alternative. > >>>>>>>>>> From my perspective, this is what I see as the current > >>>> situation: > >>>>>>>>>> 1- Encryption at rest in Accumulo isn't actively being > >> worked > >>> on > >>>>>>>>>> 2- Encryption at rest in Accumulo isn't part of the public > >> API > >>>> or > >>>>>>>>> marketed > >>>>>>>>>> capabilities > >>>>>>>>>> 3- Documentation for what does exist is scattered throughout > >>>> Jira > >>>>>>>>> comments > >>>>>>>>>> or presentations > >>>>>>>>>> 4- A viable alternative exists that appears to have feature > >>>>>> parity in > >>>>>>>>> HDFS > >>>>>>>>>> encryption > >>>>>>>>>> 5- HBase has finer grained encryption capabilities that > >> extend > >>>>>> beyond > >>>>>>>>> what > >>>>>>>>>> HDFS provides > >>>>>>>>>> > >>>>>>>>>> Moving forward, what's the consensus for supporting this > >>>> feature? > >>>>>>>>>> Personally, I see two options: > >>>>>>>>>> > >>>>>>>>>> 1- Start going down a path to bring the feature into the > >>>> forefront > >>>>>>> and > >>>>>>>>>> start providing feature parity with HBase > >>>>>>>>>> > >>>>>>>>>> or > >>>>>>>>>> > >>>>>>>>>> 2- Remove the feature and place emphasis on upstream > >>> encryption > >>>>>>> offerings > >>>>>>>>>> Any input is welcomed& appreciated! > >>>>>>>>>> > >>>>> > > >
